• some weeks ago, a discussion started around P3 RGB pixel panel
    Feedback by Gordon and by allObjects brought me to the idea to create a new device for pins.
    It should

    1. support 8 ports in parallel
    2. be a javascipt class
    3. replaces shiftOut and digitalWrite
    4. be faster than these functions
    5. supports general source and board optimized source

    In a first proof of concept, attached source came up to life. Gave it the name BytePort. It includes

    1. BytePort class
    2. init function to assign GPIOS and mode
    3. supports sending an array of bytes in parallel, with clock signal
    4. supports sending a byte in serial bits
    5. implementation for ESP32, btw. its preliminary, without DMA or some other fancy stuff
    6. and gets the job for LED P3 64x32 done in about 20msec

    Before spending more time on that, and may be running into a wrong direction, I would like to get feedback. Please see attached zip


    1 Attachment

  • Do you have an example of how you used this in JS? Like your P3 driver code?

    As far as I can see this doesn't add anything that you couldn't do with shiftOut (http://www.espruino.com/Reference#l__glo­bal_shiftOut) in fact it actually does less stuff?

    Couldn't we just work on making shiftOut faster, even if that was special-casing it for ESP32 when sending 8 bits at a time? I believe we may already special-case for STM32.

    Basically I'm now at the point where I'm having to pull features OUT of the Original Espruino board in order to do new releases, so I don't want to have to remove features in order to add things that don't actually add any new functionality.

    I know you think that parsing the 8 pins each time would slow the call down, but if they're pre-bound with bind it shouldn't be that much different at all. I know it's a bit fiddly, but nothing stops you making a JS class that implements BytePort - and then people can pull it in if they need, rather than it taking up space on all boards.

  • Just looking at your code, you use GPIO.out_w1tc/GPIO.out_w1ts. This is really very similar to what we do for STM32 already: https://github.com/espruino/Espruino/blo­b/e06b3e24de42adb1de6e2cc6046e525880760e­7d/src/jswrap_io.c#L462

    There's a function called jshGetPinAddress: https://github.com/espruino/Espruino/blo­b/e06b3e24de42adb1de6e2cc6046e525880760e­7d/src/jswrap_io.c#L574

    While this only deals with one address for a pin (which only seems to work on STM32) we could extend that to return:

    • An address for setting a pin
    • An address for clearing a pin
    • A bit mask for the pin

    That would work on nRF52/ESP32 and STM32, and then we could have a fast shiftOut (and also software SPI IIRC) on all platforms, really easily.

  • Check this branch out: https://github.com/espruino/Espruino/com­pare/direct_io

    Not tested on anything yet, but hopefully that'll make shiftOut significantly faster on all the main platforms (not ESP8266, but presumably direct IO could be added there too).

    I was wrong about software SPI being improved, but that could be improved in the same way now without having to add anything platform specific.

  • Fully agree to "it does less staff compared to shiftOut". Simple reason is, I stopped implementing all options, and use existing code, to get first feedback. If this idea will not make it to Espruino, at least I didn't waste too much time.

    Main target was to arrive at about 20msecs for each refresh. Please see attached file for JS code, its also in an early state. There are some additional time saver, in predefining arrays instead of creating new arrays all the time. Thats all I could find.
    BTW, found out that an uint8Array was slow compared to an simple JS Array, at least on ESP32

    Lets show, what I did:
    My first step was to get shiftOut faster, and failed at a point.
    One problem was that shiftOut always handles pins, on each call. Sorry, but even if this takes 500usecs it sums up to 8ms. Thats a lot, if you have 20 msecs for one refresh only.
    From my point of view, creating a class which splits shiftOut into several steps, could be an answer.
    And it would open the option for more byteport functions like reading 8 pins at a time. allObjects mentioned good old Centronics interface, ....

    Next came up with use of GPIO.outw1tc, first version added up all bits and at the end sent it to GPIO.out_w1.... Similiar to your JshPinAddress. Once again there was an end of optimization for speed.
    Working with pin by pin was too slow.

    Helpful step was to create a translation table from byte to pins in GPIO.out_w1....
    Obviously this needs some jsVars to store the table
    Implementing this in shiftOut would mean, add a lot of #ifdef ESP32. IMHO, we should try to avoid too much board specific ifdefs in core.

    Let me try your changes tomorrow.
    Its funny to see same naming. In one of my tests I choose a function called jswrap_io_shiftOutCallbackFast :-)


    1 Attachment

  • Couldn't resist and tested today ;-)
    direct_io branch : 26msec
    bytePort branch : 18msec
    There is one more problem with direct_io. LED Matrix does not show what expected.
    Its time bed time now. Will take a closer look tomorrow

  • Interesting - so even with shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{­clk:Clock}); it adds that 500uS each call? That's something I should look at as it should be extremely fast - it's basically just an array of 6 integers and really should be almost as quick as your use of getFromVar (since that involves a copy of all the pointers).

    Are you sure you're compiling with RELEASE=1? The assertions will slow it down, but they shouldn't be compiled in for release.

    It may be there's a bug in shiftout_fast that causes the corruption - could just be the way I implemented jshGetPinAddress on ESP32.

    found out that an uint8Array was slow compared to an simple JS Array, at least on ESP32

    It's slow for allocation, but shouldn't be slow for access. If you can get a good test example that is also slow on Espruino boards I'm very happy to look at it.

    18msec vs 26msec

    That's an interesting slowdown - I guess it could just be my use of jsvIterateCallback vs your direct iteration from the pointer? It may be something I could special-case in jsvIterateCallback for byte arrays, which would improve a whole lot of stuff

  • I'm not absolutely sure about 500 usecs. It's some time ago, and I tested a lot of different options, at least it was remarkable.
    It is compiled with release=1, absolutely sure, since I got same hint from wilberforce. And this brought up 15% of speed.
    Tested the uint8Array in a lot of different cases. There is no general behaviour. Sometimes it is close to array, sometimes its slower. So, looks like a problem with ESP32
    Looks like your assumption of jsvIterateCallback is correct. After switching to that in my test, time goes up to 22msec.
    I did a lot of testing with a lot of different sources, don't remember each of them in detail.
    Very often, one change caused another one. Therefore its hard to assign effects to one special change. Its fighting for msecs.

    One major point, why I like the bytePort class, is the option to have a default solution for all boards, and still the option to use the strength of other boards.
    In your approach, setting pins with a register limits other solutions. In case of ESP32, I2S supports parallel mode, which would be a really fast solution. And it could open the world for more colors.
    At the end, the question is "support as much boards as possible with same technology" or "support use of strength for special boards, even if this a kind of restricting your boards"
    To be open, if I would be looking from your perspective, I would not know what's the better solution.

  • Is the Esp32 code setting up the pinmode on output every time even when it is already setup? Setting the gpio matrix via the ESP-idf might be the delay here.

  • @Gordon, let me understand - first - who

    shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{­clk:Clock});
    

    is used and - second - it then still on - invocation -

    adds that 500uS each call

    Is the bind used to partially apply the function and 'store' it in a variable / keep it handy with a variable reference, for example, var shiftOutBound = ... handy for the actual invocation with additional, invocation time specific arguments?

    I'd be surprised if 'invocation / stack-handling' takes takes that much time, except fully apply a partially applied function is the time sink.

  • To avoid misunderstandings because of the 500usecs.
    Sorry for writing this like being measured exactly. I shouldn't have done that.
    What I did during testing is to

    1. Comment everything in shiftOut, ran the loop and measured whole time
    2. Comment writing data in shiftOut only, ran the loop and measured whole time
    3. did the same for digitalWrite

    Tests have been done on ESP32 only, and duration has not been stable.
    I had up to 2 msecs difference in whole loop, running the test multiple times.
    All together a lot of time is used to check pins, set them to be output etc.

    In my understanding bind optimizes interpreting source code.
    Since args are always the same, bind stores the result of interpreting source.
    So translating text to jsVars, is done once only, obviously this saves a lot time.
    But it does not skip parts of the underlying function, in this case jswrap_io_shiftOut

    I hope this is reasonably correct, and I'm not embarrassing myself ;-)

  • Is the bind used to partially apply the function and 'store' it in a variable

    Yes - exactly as @JumJum says - you're basically eliminating the parsing step, which is the slow bit in Espruino.

    In your approach, setting pins with a register limits other solutions.

    I'm totally happy to have something like this at the top of shiftOut so you can still have your fast I2S implementation:

    [#ifdef](http://forum.espruino.com/searc­h/?q=%23ifdef) ESP32
      if (pins.length<=8 && clk && jsvIsFlatArray(data) && etc...)
        return esp32_I2S_implementation(...);
    [#endif](http://forum.espruino.com/searc­h/?q=%23endif)
       // original register-based code
    

    The thing I don't like is creating a duplicate API for the same thing if there isn't a good reason. Having two similar APIs just confuses people, gives us more stuff to maintain, and ultimately means that instead of one, optimised solution we have two partially optimised solutions.

    If we can make the existing shiftOut fast, existing code that uses it (like LPD6416) should benefit as well.

    If there really is something in shiftOut's API that makes it too slow that's different, but am I right in thinking that as far as we've found out so far:

    • Currently, shiftOut is quite fast on STM32, less so on other platforms
    • With the branch I posted (and maybe a few fixes) it should be fast on all platforms, but not as fast as @JumJum's test code
    • So far the biggest reason we're sure of for the difference in speed seems to be direct memory access vs jsvIterateCallback?

    So if we can merge my tweaks and fix jsvIterateCallback we should be within a few percent of @JumJum's implementation, and then maybe we can add the special case for I2S to really speed things up for ESP32 in the right cases?

  • Actually I should add that because of jsvIterateCallback, shiftOut supports objects as arguments, for instance a data argument containing extra stuff like {data:0,count:512}.

    So with a tiny tweak to jsvIterateCallback your code could actually be written like this:

      shiftData = [];
      me.convertArray = function(){
        // ...
        shiftData = [];
        for(i = 0; i < 16; i++){
          shiftData.push(digitalWrite.bind(null,[E­nable,Latch,Latch,D,C,B,A,Enable], 5|i<<3));
          shiftData.push(new Uint8Array(ledBuf.buffer,i*64,64));
        }
        me.scan = shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{­clk:Clock}, shiftData);;
      };
    

    This would be way faster, since it's not actually executing any JS in order to do the entire scan - including toggling the data lines.

  • First of all, duplicating a function under a different name was not my intention.
    The idea was, on a long term, to mark shiftOut and digitalWrite as deprecated and replace it with something like bytePort. The example for bytePort includes an option to have default handling and familiy specific handling. At least, this is the idea. And it is faster in sending data.
    To get a better understanding of where time is going, I tested duration of the scan loop, and switched off one part after the other in source from the branch Gordon posted.
    This is the result, looks like shifting out data is fast, but preparing options and GPIOs takes a lot of time. At least for ESP32.

    scan loop XXXXXXXX
      shiftout Data(sfnc) XXXXXXX
          set options for internal use XXXXXX
      set array of pins XXXXX
      assign output, and pin mask XXXX
      set clock XXX
      push data out XX
     set row address(dfnc) X
    duration(msec) 25,300 18,160 14,350 13,820 11,090 9,290 6,780 3,810
  • Thanks, that's really interesting! What is push data out?

    I just committed those changes I mentioned above - it should be useful in a bunch of cases since it applies even to SPI/I2C/etc.

    Basically this should mean that the setup is done only once - even in BytePort you had to grab the binary data out of a variable each call, but this should avoid that since it'll be stored on the stack - so I'm hopeful that with these changes we may end up with something even faster.

    I don't have a P3 matrix here so this is what I did for the LPD6416 - hopefully the change is pretty straightforward.

    // original
    connect1 = function(pins) {
      var s = shiftOut.bind(null, [pins.nG,pins.nR], { clk : pins.S, repeat : 4 });
      var d = digitalWrite.bind(null, [pins.nEN,pins.L,pins.L,pins.D,pins.C,pi­ns.B,pins.A,pins.nEN]);
      var en = pins.nEN;
      var g = Graphics.createArrayBuffer(64,16,2);
      var u = g.buffer;
      g.scan = function() {
        en.reset();
        for (var y=0;y<16;y++) {s(new Uint8Array(u,y*16,16));d(33|y<<1);}
        en.set();
      };
      g.setBgColor(3);
      g.setColor(0);
      g.clear();
      return g;
    };
    
    // fast
    connect2 = function(pins) {
      var s = shiftOut.bind(null, [pins.nG,pins.nR], { clk : pins.S, repeat : 4 });
      var d = digitalWrite.bind(null, [pins.nEN,pins.L,pins.L,pins.D,pins.C,pi­ns.B,pins.A,pins.nEN]);
      var en = pins.nEN;
      var g = Graphics.createArrayBuffer(64,16,2);
      var u = g.buffer;
      var arr = [];
      g.prep = function() {
        arr = [];
        for (var y=0;y<16;y++) {
          arr.push(new Uint8Array(u,y*16,16));
          arr.push({callback:d.bind(null,33|y<<1)}­);
        }
      };
      g.scan = function() {
        en.reset();
        s(arr);
        en.set();
      };
      g.setBgColor(3);
      g.setColor(0);
      g.clear();
      return g;
    };
    
    function time(fn) {
      var g = fn({A:B15, B:B14, C:B13, D:B10,
                          nG:B1, L:A6, S:A5, nEN:A8, nR:A7});
      if (g.prep) g.prep();
      var t=getTime();
      var n=100;
      while (n--)g.scan();
      print(getTime()-t);
    }
    
    // normal
    time(connect1);
    // save the loop and function calls - roughly twice as fast on STM32
    time(connect2);
    
  • push data out is jsvIterateCallback(data, allFast.....

    I'll check it soon.

    even in BytePort you had to grab the binary data out of a variable each call
    Thats correct, but no need to grab pins, set them to output, calculate mask, ... on each call
    Just tinking, binary data, couldn't this be added to initialization, and we only have to set a pointer ? Or would this cause problems if jsvars are restructured ?, hmmm
    Anyway, we got some speed in a lot of places (I2C, SPI, ....)
    And the way you use an array to avoid executing JS, very interesting.

  • binary data, couldn't this be added to initialization, and we only have to set a pointer ?

    Ideally to do your method you'd allocate a Flat String, and put the data in there. You could then get a pointer to it when executing the code. It's what I should do for most stuff like Graphics/etc (but I don't).

  • I tested with latest direct_io branch, see my code
    First checked speed, which is really fast, came down to 18msec/scan :-)
    Up to this everything was fine

    var led;
    connect2 = function(R1,R2,B1,B2,G1,G2,A,B,C,D,Latch­,Clock,Enable) {
      var sfnc = shiftOut.bind(null,[R1,G1,B1,R2,G2,B2],{­clk:Clock});
      var dfnc = digitalWrite.bind(null,[Enable,Latch,Lat­ch,D,C,B,A,Enable]);
    
      var en = Enable;
      var g = Graphics.createArrayBuffer(64,32,4);
      var buf = g.buffer;
      var ledBuf = new Uint8Array(64 * 32 / 2);//converted graphics.buffer to data for LED
      var arr = [];
      g.prep = function() {
        var bufpnt1,bufpnt2,ledpnt;
        bufpnt1 = 0; bufpnt2 = 512; ledpnt = 0;
        var pane = false,i,j;
        for(i = 0; i < 16; i++){
          for(j = 0; j < 64;j +=2){
            ledBuf[ledpnt] = (buf[bufpnt2] & 7) + ((buf[bufpnt1] & 7)<<3);
            ledpnt++;
            ledBuf[ledpnt] = ((buf[bufpnt2] & 0xf0) >>4) + ((buf[bufpnt1] & 0xf0)>>1);
            ledpnt++;
            bufpnt1++;
            bufpnt2++;
          }
        }
        arr = [];
        for(var y=0; y < 16;y++){
          arr.push(new Uint8Array(ledBuf.buffer,y*64,64));
          arr.push({callback:dfnc.bind(null,33|y<<­1)});
        }
      };
      g.scan = function() {
        en.reset();
        sfnc(arr);
        en.set();
      };
      g.setBgColor(1);
      g.clear();
      g.setColor(2);
      g.fillRect(5,12,50,25);
      g.setColor(4);
      g.fillRect(10,14,40,20);
      g.fillRect(10,0,40,0);
      return g;
    };
    function tst(){
      led = connect2(D2,D16,D4,D17,D15,D27, D5,D18,D19,D21, D26,D22,D25);
      if(led.prep) led.prep();
      var t=getTime();
      led.scan();
      print((getTime()-t) * 1000);
    }
    

    Next tried to work with graphics and whatever I did, panel displays something different. :-(
    During testing I added some printf's here and there
    First confusing was how often jswrap_io_shiftOutCallbackFast was called. My expectation was 1024, but the counter I added returns 8704
    Counter was added like this

    int xx = 0;
    void jswrap_io_shiftOutCallbackFast(int val, void *data) {
    xx++;
      jswrap_io_shiftOutData *d = (jswrap_io_shiftOutData*)data;
      int n, i;
      for (i=0;i<d->repeat;i++) {
    .....
    void jswrap_io_shiftOut(JsVar *pins, JsVar *options, JsVar *data) {
    xx = 0;
      jswrap_io_shiftOutData d;
      d.cnt = 0;
      d.clk = PIN_UNDEFINED;
    .......
      // Now run through the data, pushing it out
      jsvIterateCallback(data, allFast ? jswrap_io_shiftOutCallbackFast : jswrap_io_shiftOutCallback, &d);
    printf("xx:%d\n",xx);
    }
    

    Checking jsvIterateCallback was next step, some more printf like this in
    BTW, this is the short version of a long story, ....

      // Handle the data being an array buffer
      else if (jsvIsArrayBuffer(data)) {
    jsWarn("ArrayBuffer:%j\n",data);
        JsvArrayBufferIterator it;
        jsvArrayBufferIteratorNew(&it, data, 0);
    jsWarn("byteLength:%d\n",it.byteLength);­
        if (JSV_ARRAYBUFFER_GET_SIZE(it.type) == 1 && !JSV_ARRAYBUFFER_IS_SIGNED(it.type)) {
          JsvStringIterator *sit = &it.it;
    
    

    See log, Sum of all byteLength gives exactly 8704, which is the number how often callbackFast is called.
    Here we reach the point where the story is beyond my knowledge. Hope you can help.

    tst()
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:64
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:128
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:192
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:256
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:320
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:384
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:448
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:512
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:576
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:640
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:704
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:768
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:832
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:896
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:960
    WARNING: ArrayBuffer:new Uint8Array([9, 9, 9, 9, 9, 17, 17, 17, 17, 17, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
    WARNING: byteLength:1024
    xx:8704

  • Argh - thanks for checking into this!

    It was because my 'fast path' for execution wasn't checking when it should stop correctly.

    If you pull now and try, it should at least call iteration the correct number of times.

    And on the plus side if it's 18ms while doing 8x as much work as is needed, it should be substantially faster now!

  • Believe me, the world is absolutely crazy.
    First of all, its much faster, I've been down close to 10 ms.
    Bad news, upper 16 rows are always green and lower 16 rows are white.
    Whatever I tried, there was no way to get this running. Even hardcode the colour did not help.

    After a lot of frustration, I went back to setting pins with jshPinSetValue, and surprise surprise, now it works.
    Looks to me like having a timing problem.
    Good news is, time for scan is 13 ms now, which is still really fast.
    What we don't have is brightness. Line 16 and 32 are much brighter than others.
    And brightness is flickering.
    This also looks like a timing problem.

    Anyway, my next step will be to create a graphics driver, using Graphics.createCallback to get rid of prep step.

  • Great! So you're not really using the direct_io branch now, but just standard Espruino? It's a shame - can you see the pins changing state?

    The brightness is probably because those rows are left on for slightly longer... I guess en controls if the row is actually lit? If so, just adding arr.push({callback:en.set.bind(en)}); should fix it?

    I guess the lack of brightness is because now we're actually able to push the data out quickly - flickering might be because something is stopping scan from getting called when it should be.

    My next step will be to create a graphics driver, using Graphics.createCallback to get rid of prep step.

    I'd have thought that might be quite slow... On Pixl.js/etc we just have g.flip() that you call when you want to display what you rendered (which is basically what prep() is doing) - or even if you want to avoid that you could check g.getModified() to see if anything has changed?

  • I use the direct_io branch, and love the changes.
    In my local copy, I changed the fast function to use jsh_PinSetValue instead of using GPIO-Register.
    Flickering could be because of task management in RTOS
    Agree to slowness of a JS-Solution, and will take a closer look to g.flip

  • arr.push({callback:en.set.bind(en)});
    fixes in the wrong direction, we get dark line 15/31
    Timing is a very special for the P3 board

  • Success, but once again I've only an idea of an idea why, but it works.
    Removed en.reset() and en.set() from g.scan
    Pushed {callback:en.reset.bind(en)} as first entry into arr,
    Pushed {callback:en.set.bind(en)} as last entry into arr
    Due to this, lines 15 and 31 have same brightness as all others.

  • Last not least, flickering is gone.
    Surrounded jsvIterateCallback with rtos commands to switch taskhandling off

    vTaskSuspendAll();
      jsvIterateCallback(data, allFast ? jswrap_io_shiftOutCallbackFast : jswrap_io_shiftOutCallback, &d);
    xTaskResumeAll();
    

    Actual status, compared to direct_io branch is:

    • switched fast io off (return false in jshGetPinAddress)
    • switched task handling off during jsvIterateCallback
    • moved en.set() and en.reset() from js(g.scan) to be set in arr

    Latest next week, I'll send a request with all changes to direct_io branch.
    Open questions:

    • how do we call task relevant RTOS function ? We could use #ifdef ESP32 or #ifdef RTOS. I would prefer the 2nd one.
    • do we need jshIsPinValid(d->pins[n]) in jswrap_io_shiftOutCallback ? Isn't this already done in jswrap_io_shiftOut ?
  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Additional device for Espruino to support 8 pins in parallel (for discussion)

Posted by Avatar for JumJum @JumJum

Actions