Additional device for Espruino to support 8 pins in parallel (for discussion)

Posted on Wed 7th, November 2018

Page of 2

• #1

JumJum
some weeks ago, a discussion started around P3 RGB pixel panel
Feedback by Gordon and by allObjects brought me to the idea to create a new device for pins.
It should
1. support 8 ports in parallel
2. be a javascipt class
3. replaces shiftOut and digitalWrite
4. be faster than these functions
5. supports general source and board optimized source
In a first proof of concept, attached source came up to life. Gave it the name BytePort. It includes
1. BytePort class
2. init function to assign GPIOS and mode
3. supports sending an array of bytes in parallel, with clock signal
4. supports sending a byte in serial bits
5. implementation for ESP32, btw. its preliminary, without DMA or some other fancy stuff
6. and gets the job for LED P3 64x32 done in about 20msec
Before spending more time on that, and may be running into a wrong direction, I would like to get feedback. Please see attached zip

1 Attachment
- BytePort.zip
• #2

Gordon

Do you have an example of how you used this in JS? Like your P3 driver code?

As far as I can see this doesn't add anything that you couldn't do with shiftOut (http://www.espruino.com/Reference#l__global_shiftOut) in fact it actually does less stuff?

Couldn't we just work on making shiftOut faster, even if that was special-casing it for ESP32 when sending 8 bits at a time? I believe we may already special-case for STM32.

Basically I'm now at the point where I'm having to pull features OUT of the Original Espruino board in order to do new releases, so I don't want to have to remove features in order to add things that don't actually add any new functionality.

I know you think that parsing the 8 pins each time would slow the call down, but if they're pre-bound with bind it shouldn't be that much different at all. I know it's a bit fiddly, but nothing stops you making a JS class that implements BytePort - and then people can pull it in if they need, rather than it taking up space on all boards.
• #3

Gordon
Just looking at your code, you use GPIO.out_w1tc/GPIO.out_w1ts. This is really very similar to what we do for STM32 already: https://github.com/espruino/Espruino/blob/e06b3e24de42adb1de6e2cc6046e525880760e7d/src/jswrap_io.c#L462

There's a function called jshGetPinAddress: https://github.com/espruino/Espruino/blob/e06b3e24de42adb1de6e2cc6046e525880760e7d/src/jswrap_io.c#L574

While this only deals with one address for a pin (which only seems to work on STM32) we could extend that to return:
- An address for setting a pin
- An address for clearing a pin
- A bit mask for the pin
That would work on nRF52/ESP32 and STM32, and then we could have a fast shiftOut (and also software SPI IIRC) on all platforms, really easily.
• #4

Gordon

Check this branch out: https://github.com/espruino/Espruino/compare/direct_io

Not tested on anything yet, but hopefully that'll make shiftOut significantly faster on all the main platforms (not ESP8266, but presumably direct IO could be added there too).

I was wrong about software SPI being improved, but that could be improved in the same way now without having to add anything platform specific.
• #5

JumJum
Fully agree to "it does less staff compared to shiftOut". Simple reason is, I stopped implementing all options, and use existing code, to get first feedback. If this idea will not make it to Espruino, at least I didn't waste too much time.

Main target was to arrive at about 20msecs for each refresh. Please see attached file for JS code, its also in an early state. There are some additional time saver, in predefining arrays instead of creating new arrays all the time. Thats all I could find.
BTW, found out that an uint8Array was slow compared to an simple JS Array, at least on ESP32

Lets show, what I did:
My first step was to get shiftOut faster, and failed at a point.
One problem was that shiftOut always handles pins, on each call. Sorry, but even if this takes 500usecs it sums up to 8ms. Thats a lot, if you have 20 msecs for one refresh only.
From my point of view, creating a class which splits shiftOut into several steps, could be an answer.
And it would open the option for more byteport functions like reading 8 pins at a time. allObjects mentioned good old Centronics interface, ....

Next came up with use of GPIO.outw1tc, first version added up all bits and at the end sent it to GPIO.out_w1.... Similiar to your JshPinAddress. Once again there was an end of optimization for speed.
Working with pin by pin was too slow.

Helpful step was to create a translation table from byte to pins in GPIO.out_w1....
Obviously this needs some jsVars to store the table
Implementing this in shiftOut would mean, add a lot of #ifdef ESP32. IMHO, we should try to avoid too much board specific ifdefs in core.

Let me try your changes tomorrow.
Its funny to see same naming. In one of my tests I choose a function called jswrap_io_shiftOutCallbackFast :-)

1 Attachment
- JSSource.txt
• #6

JumJum

Couldn't resist and tested today ;-)
direct_io branch : 26msec
bytePort branch : 18msec
There is one more problem with direct_io. LED Matrix does not show what expected.
Its time bed time now. Will take a closer look tomorrow
• #7

Gordon

Interesting - so even with shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{clk:Clock}); it adds that 500uS each call? That's something I should look at as it should be extremely fast - it's basically just an array of 6 integers and really should be almost as quick as your use of getFromVar (since that involves a copy of all the pointers).

Are you sure you're compiling with RELEASE=1? The assertions will slow it down, but they shouldn't be compiled in for release.

It may be there's a bug in shiftout_fast that causes the corruption - could just be the way I implemented jshGetPinAddress on ESP32.

found out that an uint8Array was slow compared to an simple JS Array, at least on ESP32

It's slow for allocation, but shouldn't be slow for access. If you can get a good test example that is also slow on Espruino boards I'm very happy to look at it.

18msec vs 26msec

That's an interesting slowdown - I guess it could just be my use of jsvIterateCallback vs your direct iteration from the pointer? It may be something I could special-case in jsvIterateCallback for byte arrays, which would improve a whole lot of stuff
• #8

JumJum

I'm not absolutely sure about 500 usecs. It's some time ago, and I tested a lot of different options, at least it was remarkable.
It is compiled with release=1, absolutely sure, since I got same hint from wilberforce. And this brought up 15% of speed.
Tested the uint8Array in a lot of different cases. There is no general behaviour. Sometimes it is close to array, sometimes its slower. So, looks like a problem with ESP32
Looks like your assumption of jsvIterateCallback is correct. After switching to that in my test, time goes up to 22msec.
I did a lot of testing with a lot of different sources, don't remember each of them in detail.
Very often, one change caused another one. Therefore its hard to assign effects to one special change. Its fighting for msecs.

One major point, why I like the bytePort class, is the option to have a default solution for all boards, and still the option to use the strength of other boards.
In your approach, setting pins with a register limits other solutions. In case of ESP32, I2S supports parallel mode, which would be a really fast solution. And it could open the world for more colors.
At the end, the question is "support as much boards as possible with same technology" or "support use of strength for special boards, even if this a kind of restricting your boards"
To be open, if I would be looking from your perspective, I would not know what's the better solution.
• #9

Wilberforce

Is the Esp32 code setting up the pinmode on output every time even when it is already setup? Setting the gpio matrix via the ESP-idf might be the delay here.
• #10

allObjects
@Gordon, let me understand - first - who
```
shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{clk:Clock});
```
is used and - second - it then still on - invocation -

adds that 500uS each call

Is the bind used to partially apply the function and 'store' it in a variable / keep it handy with a variable reference, for example, var shiftOutBound = ... handy for the actual invocation with additional, invocation time specific arguments?

I'd be surprised if 'invocation / stack-handling' takes takes that much time, except fully apply a partially applied function is the time sink.
• #11

JumJum
To avoid misunderstandings because of the 500usecs.
Sorry for writing this like being measured exactly. I shouldn't have done that.
What I did during testing is to
1. Comment everything in shiftOut, ran the loop and measured whole time
2. Comment writing data in shiftOut only, ran the loop and measured whole time
3. did the same for digitalWrite
Tests have been done on ESP32 only, and duration has not been stable.
I had up to 2 msecs difference in whole loop, running the test multiple times.
All together a lot of time is used to check pins, set them to be output etc.

In my understanding bind optimizes interpreting source code.
Since args are always the same, bind stores the result of interpreting source.
So translating text to jsVars, is done once only, obviously this saves a lot time.
But it does not skip parts of the underlying function, in this case jswrap_io_shiftOut

I hope this is reasonably correct, and I'm not embarrassing myself ;-)
• #12

Gordon
Is the bind used to partially apply the function and 'store' it in a variable

Yes - exactly as @JumJum says - you're basically eliminating the parsing step, which is the slow bit in Espruino.

In your approach, setting pins with a register limits other solutions.

I'm totally happy to have something like this at the top of shiftOut so you can still have your fast I2S implementation:
```
[#ifdef](http://forum.espruino.com/search/?q=%23ifdef) ESP32
  if (pins.length<=8 && clk && jsvIsFlatArray(data) && etc...)
    return esp32_I2S_implementation(...);
[#endif](http://forum.espruino.com/search/?q=%23endif)
   // original register-based code
```
The thing I don't like is creating a duplicate API for the same thing if there isn't a good reason. Having two similar APIs just confuses people, gives us more stuff to maintain, and ultimately means that instead of one, optimised solution we have two partially optimised solutions.

If we can make the existing shiftOut fast, existing code that uses it (like LPD6416) should benefit as well.

If there really is something in shiftOut's API that makes it too slow that's different, but am I right in thinking that as far as we've found out so far:
- Currently, shiftOut is quite fast on STM32, less so on other platforms
- With the branch I posted (and maybe a few fixes) it should be fast on all platforms, but not as fast as @JumJum's test code
- So far the biggest reason we're sure of for the difference in speed seems to be direct memory access vs jsvIterateCallback?
So if we can merge my tweaks and fix jsvIterateCallback we should be within a few percent of @JumJum's implementation, and then maybe we can add the special case for I2S to really speed things up for ESP32 in the right cases?
• #13

Gordon
Actually I should add that because of jsvIterateCallback, shiftOut supports objects as arguments, for instance a data argument containing extra stuff like {data:0,count:512}.

So with a tiny tweak to jsvIterateCallback your code could actually be written like this:
```
  shiftData = [];
  me.convertArray = function(){
    // ...
    shiftData = [];
    for(i = 0; i < 16; i++){
      shiftData.push(digitalWrite.bind(null,[Enable,Latch,Latch,D,C,B,A,Enable], 5|i<<3));
      shiftData.push(new Uint8Array(ledBuf.buffer,i*64,64));
    }
    me.scan = shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{clk:Clock}, shiftData);;
  };
```
This would be way faster, since it's not actually executing any JS in order to do the entire scan - including toggling the data lines.

• #14

JumJum

First of all, duplicating a function under a different name was not my intention.
The idea was, on a long term, to mark shiftOut and digitalWrite as deprecated and replace it with something like bytePort. The example for bytePort includes an option to have default handling and familiy specific handling. At least, this is the idea. And it is faster in sending data.
To get a better understanding of where time is going, I tested duration of the scan loop, and switched off one part after the other in source from the branch Gordon posted.
This is the result, looks like shifting out data is fast, but preparing options and GPIOs takes a lot of time. At least for ESP32.

scan loop			X	X	X	X	X	X	X	X
	shiftout Data(sfnc)		X	X	X	X	X	X	X
		set options for internal use	X	X	X	X	X	X
		set array of pins	X	X	X	X	X
		assign output, and pin mask	X	X	X	X
		set clock	X	X	X
		push data out	X	X
	set row address(dfnc)		X
duration(msec)			25,300	18,160	14,350	13,820	11,090	9,290	6,780	3,810

• #15

Gordon

Thanks, that's really interesting! What is push data out?

I just committed those changes I mentioned above - it should be useful in a bunch of cases since it applies even to SPI/I2C/etc.

Basically this should mean that the setup is done only once - even in BytePort you had to grab the binary data out of a variable each call, but this should avoid that since it'll be stored on the stack - so I'm hopeful that with these changes we may end up with something even faster.

I don't have a P3 matrix here so this is what I did for the LPD6416 - hopefully the change is pretty straightforward.

// original
connect1 = function(pins) {
  var s = shiftOut.bind(null, [pins.nG,pins.nR], { clk : pins.S, repeat : 4 });
  var d = digitalWrite.bind(null, [pins.nEN,pins.L,pins.L,pins.D,pins.C,pins.B,pins.A,pins.nEN]);
  var en = pins.nEN;
  var g = Graphics.createArrayBuffer(64,16,2);
  var u = g.buffer;
  g.scan = function() {
    en.reset();
    for (var y=0;y<16;y++) {s(new Uint8Array(u,y*16,16));d(33|y<<1);}
    en.set();
  };
  g.setBgColor(3);
  g.setColor(0);
  g.clear();
  return g;
};

// fast
connect2 = function(pins) {
  var s = shiftOut.bind(null, [pins.nG,pins.nR], { clk : pins.S, repeat : 4 });
  var d = digitalWrite.bind(null, [pins.nEN,pins.L,pins.L,pins.D,pins.C,pins.B,pins.A,pins.nEN]);
  var en = pins.nEN;
  var g = Graphics.createArrayBuffer(64,16,2);
  var u = g.buffer;
  var arr = [];
  g.prep = function() {
    arr = [];
    for (var y=0;y<16;y++) {
      arr.push(new Uint8Array(u,y*16,16));
      arr.push({callback:d.bind(null,33|y<<1)});
    }
  };
  g.scan = function() {
    en.reset();
    s(arr);
    en.set();
  };
  g.setBgColor(3);
  g.setColor(0);
  g.clear();
  return g;
};

function time(fn) {
  var g = fn({A:B15, B:B14, C:B13, D:B10,
                      nG:B1, L:A6, S:A5, nEN:A8, nR:A7});
  if (g.prep) g.prep();
  var t=getTime();
  var n=100;
  while (n--)g.scan();
  print(getTime()-t);
}

// normal
time(connect1);
// save the loop and function calls - roughly twice as fast on STM32
time(connect2);

• #16

JumJum

push data out is jsvIterateCallback(data, allFast.....

I'll check it soon.

even in BytePort you had to grab the binary data out of a variable each call
Thats correct, but no need to grab pins, set them to output, calculate mask, ... on each call
Just tinking, binary data, couldn't this be added to initialization, and we only have to set a pointer ? Or would this cause problems if jsvars are restructured ?, hmmm
Anyway, we got some speed in a lot of places (I2C, SPI, ....)
And the way you use an array to avoid executing JS, very interesting.
• #17

Gordon

binary data, couldn't this be added to initialization, and we only have to set a pointer ?

Ideally to do your method you'd allocate a Flat String, and put the data in there. You could then get a pointer to it when executing the code. It's what I should do for most stuff like Graphics/etc (but I don't).
• #18

JumJum
I tested with latest direct_io branch, see my code
First checked speed, which is really fast, came down to 18msec/scan :-)
Up to this everything was fine
```
var led;
connect2 = function(R1,R2,B1,B2,G1,G2,A,B,C,D,Latch,Clock,Enable) {
  var sfnc = shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{clk:Clock});
  var dfnc = digitalWrite.bind(null,[Enable,Latch,Latch,D,C,B,A,Enable]);

  var en = Enable;
  var g = Graphics.createArrayBuffer(64,32,4);
  var buf = g.buffer;
  var ledBuf = new Uint8Array(64 * 32 / 2);//converted graphics.buffer to data for LED
  var arr = [];
  g.prep = function() {
    var bufpnt1,bufpnt2,ledpnt;
    bufpnt1 = 0; bufpnt2 = 512; ledpnt = 0;
    var pane = false,i,j;
    for(i = 0; i < 16; i++){
      for(j = 0; j < 64;j +=2){
        ledBuf[ledpnt] = (buf[bufpnt2] & 7) + ((buf[bufpnt1] & 7)<<3);
        ledpnt++;
        ledBuf[ledpnt] = ((buf[bufpnt2] & 0xf0) >>4) + ((buf[bufpnt1] & 0xf0)>>1);
        ledpnt++;
        bufpnt1++;
        bufpnt2++;
      }
    }
    arr = [];
    for(var y=0; y < 16;y++){
      arr.push(new Uint8Array(ledBuf.buffer,y*64,64));
      arr.push({callback:dfnc.bind(null,33|y<<1)});
    }
  };
  g.scan = function() {
    en.reset();
    sfnc(arr);
    en.set();
  };
  g.setBgColor(1);
  g.clear();
  g.setColor(2);
  g.fillRect(5,12,50,25);
  g.setColor(4);
  g.fillRect(10,14,40,20);
  g.fillRect(10,0,40,0);
  return g;
};
function tst(){
  led = connect2(D2,D16,D4,D17,D15,D27, D5,D18,D19,D21, D26,D22,D25);
  if(led.prep) led.prep();
  var t=getTime();
  led.scan();
  print((getTime()-t) * 1000);
}
```
Next tried to work with graphics and whatever I did, panel displays something different. :-(
During testing I added some printf's here and there
First confusing was how often jswrap_io_shiftOutCallbackFast was called. My expectation was 1024, but the counter I added returns 8704
Counter was added like this
```
int xx = 0;
void jswrap_io_shiftOutCallbackFast(int val, void *data) {
xx++;
  jswrap_io_shiftOutData *d = (jswrap_io_shiftOutData*)data;
  int n, i;
  for (i=0;i<d->repeat;i++) {
.....
void jswrap_io_shiftOut(JsVar *pins, JsVar *options, JsVar *data) {
xx = 0;
  jswrap_io_shiftOutData d;
  d.cnt = 0;
  d.clk = PIN_UNDEFINED;
.......
  // Now run through the data, pushing it out
  jsvIterateCallback(data, allFast ? jswrap_io_shiftOutCallbackFast : jswrap_io_shiftOutCallback, &d);
printf("xx:%d\n",xx);
}
```
Checking jsvIterateCallback was next step, some more printf like this in
BTW, this is the short version of a long story, ....
```
  // Handle the data being an array buffer
  else if (jsvIsArrayBuffer(data)) {
jsWarn("ArrayBuffer:%j\n",data);
    JsvArrayBufferIterator it;
    jsvArrayBufferIteratorNew(&it, data, 0);
jsWarn("byteLength:%d\n",it.byteLength);
    if (JSV_ARRAYBUFFER_GET_SIZE(it.type) == 1 && !JSV_ARRAYBUFFER_IS_SIGNED(it.type)) {
      JsvStringIterator *sit = &it.it;
```
See log, Sum of all byteLength gives exactly 8704, which is the number how often callbackFast is called.
Here we reach the point where the story is beyond my knowledge. Hope you can help.

tst()
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:64
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:128
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:192
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:256
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:320
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:384
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:448
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:512
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:576
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:640
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:704
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:768
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:832
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:896
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:960
WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
WARNING: byteLength:1024
xx:8704
• #19

Gordon

Argh - thanks for checking into this!

It was because my 'fast path' for execution wasn't checking when it should stop correctly.

If you pull now and try, it should at least call iteration the correct number of times.

And on the plus side if it's 18ms while doing 8x as much work as is needed, it should be substantially faster now!
• #20

JumJum

Believe me, the world is absolutely crazy.
First of all, its much faster, I've been down close to 10 ms.
Bad news, upper 16 rows are always green and lower 16 rows are white.
Whatever I tried, there was no way to get this running. Even hardcode the colour did not help.

After a lot of frustration, I went back to setting pins with jshPinSetValue, and surprise surprise, now it works.
Looks to me like having a timing problem.
Good news is, time for scan is 13 ms now, which is still really fast.
What we don't have is brightness. Line 16 and 32 are much brighter than others.
And brightness is flickering.
This also looks like a timing problem.

Anyway, my next step will be to create a graphics driver, using Graphics.createCallback to get rid of prep step.
• #21

Gordon

Great! So you're not really using the direct_io branch now, but just standard Espruino? It's a shame - can you see the pins changing state?

The brightness is probably because those rows are left on for slightly longer... I guess en controls if the row is actually lit? If so, just adding arr.push({callback:en.set.bind(en)}); should fix it?

I guess the lack of brightness is because now we're actually able to push the data out quickly - flickering might be because something is stopping scan from getting called when it should be.

My next step will be to create a graphics driver, using Graphics.createCallback to get rid of prep step.

I'd have thought that might be quite slow... On Pixl.js/etc we just have g.flip() that you call when you want to display what you rendered (which is basically what prep() is doing) - or even if you want to avoid that you could check g.getModified() to see if anything has changed?
• #22

JumJum

I use the direct_io branch, and love the changes.
In my local copy, I changed the fast function to use jsh_PinSetValue instead of using GPIO-Register.
Flickering could be because of task management in RTOS
Agree to slowness of a JS-Solution, and will take a closer look to g.flip
• #23

JumJum

arr.push({callback:en.set.bind(en)});
fixes in the wrong direction, we get dark line 15/31
Timing is a very special for the P3 board
• #24

JumJum

Success, but once again I've only an idea of an idea why, but it works.
Removed en.reset() and en.set() from g.scan
Pushed {callback:en.reset.bind(en)} as first entry into arr,
Pushed {callback:en.set.bind(en)} as last entry into arr
Due to this, lines 15 and 31 have same brightness as all others.
• #25

JumJum
Last not least, flickering is gone.
Surrounded jsvIterateCallback with rtos commands to switch taskhandling off
```
vTaskSuspendAll();
  jsvIterateCallback(data, allFast ? jswrap_io_shiftOutCallbackFast : jswrap_io_shiftOutCallback, &d);
xTaskResumeAll();
```
Actual status, compared to direct_io branch is:
- switched fast io off (return false in jshGetPinAddress)
- switched task handling off during jsvIterateCallback
- moved en.set() and en.reset() from js(g.scan) to be set in arr
Latest next week, I'll send a request with all changes to direct_io branch.
Open questions:
- how do we call task relevant RTOS function ? We could use #ifdef ESP32 or #ifdef RTOS. I would prefer the 2nd one.
- do we need jshIsPinValid(d->pins[n]) in jswrap_io_shiftOutCallback ? Isn't this already done in jswrap_io_shiftOut ?

Page of 2

Post a reply
- Bold
- Italics
- Link
- Image
- List
- Quote
- code
- Preview
Formatting Help

Don't worry about formatting, just type in the text and we'll take care of making sense of it. We will auto-convert links, and if you put asterisks around words we will make them bold.

Tips:

Create headers by underlining text with ==== or ----

To *italicise* text put one asterisk each side of the word

To **bold** text put two asterisks each side of the word

Embed images by entering:
![](https://www.google.co.uk/images/srpr/logo4w.png)
That's the hard one: exclamation, square brackets and then the URL to the image in brackets.

* Create lists by starting lines with asterisks

1. Create numbered lists by starting lines with a number and a dot

> Quote text by starting lines with >

Mention another user by @username

For syntax highlighting, surround the code block with three backticks:

```
Your code goes here
```
Just like Github, a blank line must precede a code block.

If you upload more than 5 files we will display all attachments as thumbnails.

For a full reference visit the Markdown syntax.

Additional device for Espruino to support 8 pins in parallel (for discussion)

About