P3 RGB pixel panel

Posted on
  • Has anyone used Espruino with a P3 RGB pixel matrix like this? Where would I begin?
    https://www.aliexpress.com/item/P3-RGB-p­ixel-panel-HD-display-64x32-dot-matrix-p­3-smd-rgb-led-module/32728985432.html

  • I haven't heard of anyone doing it, but I think it's possible. I believe those things need active scanning - probably not easy to do for greyscale without custom firmware, but I think you could scan out with 3 bit colour from JS.

    I'd look at http://www.espruino.com/LPD6416 - it uses this module: http://www.espruino.com/modules/LPD6416.­js

    You'd need to add the extra colour channel and more rows, but I think that could be ok. The only think is it'd likely flicker if other JS code took more than 20ms or so to execute, but it's definitely usable (and there are some other hacks you could do with inline C as well that might fix that).

  • Thanks Gordon. I'm not great with hardware coding but LPD6416 looks like a good starting point.

  • Did you ever have any success with this?

  • I would like to do some testing on this.
    @Gordon, could you please take a look to attached code before I start testing ?
    Usually 4 eyes see more than 2

    Pins are the same as used in an working ESP32-only solution. This part should be correct.

    I've some questions:

    • what is repeat in shiftOut.bind ?
    • why do you have enable pin twice in digitalWrite ?
    • we only have 4 address bytes, how would we control 32 rows ? Seems to be a non Espruino
      questions, but may be you have some ideas
    • should we init enable to 1 ?


    function LedMatrix(){
      var me = this;
      var gr,buf; //graphics object and buffer in graphics object
      var sfnc,dfnc; //function calls to send data to LED Matrix
      var Pr1,Pr2,Pb1,Pb2,Pg1,Pg2; //Color Pins
      var Pa,Pb,Pc,Pd; //address pins
      var Platch,Pclock,Penable; //control pins
      me.init = function(R1,R2,B1,B2,G1,G2,A,B,C,D,Latch­,Clock,Enable){
        Pr1 = R1; Pr2 = R2; Pb1 = B1; Pb2 = B2; Pg1 = G1; Pg2 = G2;
        Pa = A; Pb = B; Pc = C; Pd = D;
        Platch = Latch; Pclock = Clock; Penable = Enable;
        gr = Graphics.createArrayBuffer(64,32,8);
        buf = gr.buffer;
        return gr;
      };
      sfnc = shiftOut.bind(null,[Pr1,Pr2,Pb1,Pb2,Pg1,­Pg2],{clk:Pclock,repeat : 8});
      dfnc = digitalWrite.bind(null,[Penable,Platch,P­d,Pc,Pb,Pa,Penable]);
      me.scan = function(){
        enable.reset();
        for (var y=0;y<16;y++) {
          sfnc(new Uint8Array(u,y*64,64));
          dfnc(33|y<<1);
        }
        enable.set();
      };
    }
    var led = new LedMatrix();
    function test(){
      var ledGraphik = led.init(2,16,4,17,15,27,5,18,19,21,26,2­2,25);
      ledGraphik.setBgColor(0);
      ledGraphik.clear();
      ledGraphik.SetColor(17);
      ledGraphik.clear();
      ledGraphik.fillRect(0,0,3,3);
      led.scan();
    }
    
  • Looks good - I think you actually want to leave out repeat:8 though - since what you want is to only output once per byte. [Pr1,Pr2,Pb1,Pb2,Pg1,Pg2] may be better off as [Pr2,Pb2,Pg2,Pr1,Pg1,Pb1] as well (see the 16 row comment below).

    Also scan will need to be called repeatedly for you to see anything.

    The code in the module is designed to run really quickly - so it uses a few hacks (most of which I think you picked up on!)

    what is repeat in shiftOut.bind ?

    https://developer.mozilla.org/en-US/docs­/Web/JavaScript/Reference/Global_Objects­/Function/bind

    sfnc = shiftOut.bind(null,[Pr1,Pr2,Pb1,Pb2,Pg1,­Pg2],{clk:Pclock,repeat : 8});
    sfnc(new Uint8Array(u,y*64,64));
    

    is the same as:

    shiftOut.([Pr1,Pr2,Pb1,Pb2,Pg1,Pg2],{clk­:Pclock,repeat : 8}, new Uint8Array(u,y*64,64));
    

    Except that it's significantly faster when you're calling it multiple times, since all the arguments that don't change have already been precalculated

    why do you have enable pin twice in digitalWrite ?

    There's an explanation in http://www.espruino.com/Reference#l__glo­bal_digitalWrite - again, it's a very fast way of pulsing the pin on and off while also setting the address pins.

    we only have 4 address bytes, how would we control 32 rows ? Seems to be a non Espruino questions, but may be you have some ideas

    :) These RGB LED arrays scan out two rows at once. That's why you have R1 and R2, G1,G2,B1,B2.

    Address 0 = row 0 and 16, 1 = 1 and 17, etc.

    I think I mentioned in my email, but it means that you'll have to precalculate a new array that contains the image of the first 16 rows plus the image in the next 16 rows shifted left by 3 bits (and then use [Pr2,Pb2,Pg2,Pr1,Pg1,Pb1]).

    However for testing the current code should work, and will drive the top or bottom 16 rows depending on what colours you use.

    should we init enable to 1 ?

    It looks like a good idea - but I doubt it'll matter too much as it'll get sorted once scan is called.

  • @gordon, thanks for your help, but looks like this will not run, at least on ESP32.
    Scan takes 396 msec, which is too slow.
    I'm pretty sure, the way how pins are set in ESP32 slows it down that much.
    Tested a loop with set/reset a pin, even that takes a lot of time.

    There is a pure ESP32 solution using I2S in parallel mode, having density and full colors using PWM.
    But as said, this would only work for ESP32, and is not part of standard Espruino.

  • Wow, that is crazy slow - just tried the code on a Pico and it's 20ms even unminified. You used basically the code above you uploaded, so it's shiftOut that is taking the majority of the time? If so, there must be something we can do to improve the speed of IO, like removing gpio_matrix_out? Getting faster IO would be good for a lot of stuff on ESP32.

    As you mention, a purely native solution is going to be better, but I really don't want to be stuck having to maintain ESP32 specific library code for this one type of device inside the main Espruino repo - I guess you could just keep a fork with it in?

  • hmm, there must be something different in our testing.
    To compare Original Espruino Board with ESP32 I wrote some Q&U functions (quick and ugly)
    As a first result I couldn't find any way to come down to 20ms, as you mentioned.
    Strange results, in my eyes, are:

    • setting a pin is 3 times slower on ESP32
    • to figure out how much time goes to ESP-IDF, a simple test function is added to Source, which calls jshPinSetValue with true and with false. This takes a few ms only.
    • array copy from a big array already takes much more than 20 ms
    • LedShiftout which is similiar to the command in your module takes even more time
    • if setting a pin only takes some microsecs, why does setting with pin.set() take close to a millisec

    var pin;
    switch(process.env.BOARD){
      case "ESPRUINOBOARD": pin = B5; break;
      case "ESP32": pin = D5; break;
    }
    function test(fnc){
      var t = new Date();
      eval(fnc);
      console.log(fnc.substr(3), ":" ,parseInt(new Date() -t));
    }
    function tstLoop(cnt){
      for(var i = 0; i < cnt; i++){}
    }
    function tstArr(cnt){
      var o;for(var i = 0; i < cnt; i++){o = new Uint8Array(1024);}
    }
    function tstPin(cnt){
      for(var i = 0; i < cnt; i++){pin.set();}
    }
    function tstArrCopy(cnt){
      var c,o = new Uint8Array(1024);
      for(var i = 0; i < cnt; i += 4){
        c = new Uint8Array(o,i,64);
      }
    }
    function tstDigitalWrite(cnt){
      var dfnc = digitalWrite.bind(null,[pin,pin,pin,pin,­pin,pin,pin]);
      for(i = 0; i < cnt; i++){
        dfnc(33);
      }
    }
    function tstShiftOut(cnt){
      var arr = new Uint8Array(64);
      var sfnc = shiftOut.bind(null,[pin,pin,pin,pin,pin,­pin],{clk:pin});
      for(var i = 0; i < cnt; i++){
        sfnc(arr);
      }
    }
    function tstLedShiftOut(cnt){
      var arr = new Uint8Array(1024);
      var sfnc = shiftOut.bind(null,[pin,pin,pin,pin,pin,­pin],{clk:pin});
      for(var i = 0; i < cnt; i++){
        sfnc(new Uint8Array(arr,i*64,64));
      }
    }
    function tstjshPinSetValue(cnt){
      ESP32.test(pin,cnt);
    }
    test("tstLoop(1000)");
    test("tstArr(100)");
    test("tstPin(1000)");
    test("tstArrCopy(16)");
    test("tstShiftOut(16)");
    test("tstLedShiftOut(16)");
    test("tstDigitalWrite(16)");
    switch(process.env.BOARD){
      case "ESP32": test("tstjshPinSetValue(6144)"); break;
    }
    

    Result in Millisconds of running test on both boards










    TestEspruino BoardESP32
    Loop(1000) 197230
    Arr(100) 109134
    Pin(1000) 342915
    ArrCopy(16) 15176
    ShiftOut(16) 1430
    LedShiftOut(16) 652396
    DigitalWrite(16) 819
    jshPinSetValue(6144)not available11

  • Wow, that stumped me - then I found:

    function tstLedShiftOut(cnt){
      var arr = new Uint8Array(1024);
      var sfnc = shiftOut.bind(null,[pin,pin,pin,pin,pin,­pin],{clk:pin});
      for(var i = 0; i < cnt; i++){
        sfnc(new Uint8Array(arr,i*64,64));
      }
    }
    function tstLedShiftOut2(cnt){
      var arr = new Uint8Array(1024).buffer;
      var sfnc = shiftOut.bind(null,[pin,pin,pin,pin,pin,­pin],{clk:pin});
      for(var i = 0; i < cnt; i++){
        sfnc(new Uint8Array(arr,i*64,64));
      }
    }
    test("tstLedShiftOut(16)");
    test("tstLedShiftOut2(16)");
    
    > LedShiftOut(16) : 473
    > LedShiftOut2(16) : 13
    

    The Array copy does take a bunch of time - but if you use .buffer then it's not doing a copy - it just creates a 'view' and no data gets allocated or shovelled around.

    That might well sort out your ESP32 issues :)

  • Life could be much easier, doing things correctly , .....

    Good news first, I got it running.
    Bad news, ESP32 seems to be too slow.
    Scan takes 70 msecs, so refresh rate is 14 hz only which means flickering like crazy.
    Another bad part is converting from Graphics buffer to the format needed for the LED Matrix.

    I'm pretty sure there are better ways for converting, but I'm pretty sure it will always be slow.
    An option could be a small c-function for converting, and may be shiftout could be optimized (?)

    Anyway, if somebody wants to test with other boards, this could be a good start

    function LedMatrix(){
      var me = this;
      var gr,buf; //graphics object and buffer in graphics object
      var ledBuf; //converted graphics.buffer to datapanes for LED
      var sfnc,dfnc; //function calls to send data to LED Matrix
      var Pr1,Pr2,Pb1,Pb2,Pg1,Pg2; //Color Pins
      var Pa,Pb,Pc,Pd; //address pins
      var Platch,Pclock,Penable; //control pins
      me.init = function(R1,R2,B1,B2,G1,G2,A,B,C,D,Latch­,Clock,Enable){
        Pr1 = R1; Pr2 = R2; Pb1 = B1; Pb2 = B2; Pg1 = G1; Pg2 = G2;
        Pa = A; Pb = B; Pc = C; Pd = D;
        Platch = Latch; Pclock = Clock; Penable = Enable;
        sfnc = shiftOut.bind(null,[Pr1,Pg1,Pb1,Pr2,Pg2,­Pb2],{clk:Pclock,repeat:0});
        dfnc = digitalWrite.bind(null,[Penable,Platch,P­d,Pc,Pb,Pa,Penable]);
        gr = Graphics.createArrayBuffer(64,32,4);
        buf = gr.buffer;
        return gr;
      };
      me.convertArray = function(){
        ledBuf = new Uint8Array(64 * 32 / 2);
        var bufpnt1,bufpnt2,ledpnt;
        bufpnt1 = 0; bufpnt2 = 512; ledpnt = 0;
        var pane = false;
        for(var i = 0; i < 16; i++){
          for(var j = 0; j < 64;j +=2){
            ledBuf[ledpnt] = (buf[bufpnt2] & 7) + ((buf[bufpnt1] & 7)<<3);
            ledpnt++;
            ledBuf[ledpnt] = ((buf[bufpnt2] & 0xf0) >>4) + ((buf[bufpnt1] & 0xf0)>>1);
            ledpnt++;
            bufpnt1++;
            bufpnt2++;
          }
        }
      };
      me.scan = function(){
        var y;
        y = 0;
        Penable.reset();
        for (y=0;y<16;y++) {
          sfnc(new Uint8Array(ledBuf.buffer,y*64,64));
          dfnc(33|y<<1);
        }
        Penable.set();
      };
    }
    var gr,led = new LedMatrix();
    function test(){
      gr = led.init(D2,D16,D4,D17,D15,D27,D5,D18,D1­9,D21,D26,D22,D25);
      gr.setBgColor(1);
      gr.clear();
      gr.setColor(2);
      gr.fillRect(5,16,50,25);
      gr.setColor(4);
      gr.fillRect(10,7,40,20);
      led.convertArray();
      setInterval(function(){led.scan();},80);­
    }
    
    
  • @JumJum, is there reason for attaching the methods to each instance of LedMatrix instead of using prototype?... for a singleton that works... but then the constructor and new would not be needed either.

  • @JumJum and @Gordon, regarding the slowness: would writing .convert() and .scan() in compiledC and running them independently help? Scan would have to run always on interval w/ decent frequency, convert would run like on flip, and buffers would always be full buffers?

  • Regarding ESP32 always slow as figures in post #9 'hint' makes me believe that there is something else under the (ESP32) hood running... I have not dealt with ESP32 yet, but could see that RTOS - or what ever kernel runs ESP32 - is interpretatively executing what it is told to do through exposed API / SDK? It's hard to believe that such a 'thick layer' would be shoved in between the application and the hardware and down the throat of the 'user'...

  • Definitely writing convert in Inline C would help - but Inline C doesn't work on ESP8266/ESP32 so it's not a great help here :) I believe scan actually already spends 90% of its time inside native code with shiftOut though.

    Thanks for posting up the code @JumJum - that's frustrating as it's almost there... If it were 20ms it would be reasonably workable :(

  • @allObjects, agree to your first point, Let me give an answer in (a kind of) german.
    Esch ischt halt so geworde or with others words, the tree is grown this way ;-)
    Writing in compiledC would work for the boards supported by Gordon not for ESP32 or others, Correct me if I'm wrong.
    Anyway, I could imagine to have something like E.convertForLEDBoard function in Espruino.
    And there should also be some options to speed up shiftout (significant).
    Even an empty for next loop is not faster than one of the first boards. RTOS takes some cpu power, but we use it mainly for multitasking, and tasks can change each msec.
    For me looks like the processor is not the fastest even with its high clock (80/160 mhz)

    One more problem of ESPIDF, it's hungry for memory. With V3.1 we reach 1300 kBytes, where V3.0 needs 1100 kBytes. Ok, we have a change around mbedtls, but this should never be the reason for additional 200kBytes.

    Oh, before I forget, my assumption about speed of set/reset pins was not correct, as you can see in compasion table above.

  • 1300 kBytes? Wow - and I was getting fed up with the ~120k of Nordic softdevice :)

    Can you see any obvious ways to speed up shiftOut? I guess minifying and maybe unrolling the 16 entry FOR loop would help a bit, but I'm not sure how much.

    It did look like maybe jshPinSetValue on ESP32 could skip the mapping call (if that were in PinSetState) and that might help?

    I guess for convertArray you could do:

    g = Graphics.createArrayBuffer(64,32,8); // 8 bits to make shifting easier
    // 64x16=1024
    ledBuf = new Uint8Array(1024);
    ...
    ledBuf.set(new Uint8Array(g.buffer)); // fill first part
    // now use 32 bit arrays to mash 4 pixels together at once
    var s = new Uint32Array(g.buffer,1024,1024);
    var d = new Uint32Array(ledBuf.buffer);
    for (var i=0;i<256;i++) d[i]|=s[i]<<4;
    

    Not tested, but that should work - I guess it may still not be that fast but it should be an improvement.

  • @JumJum, thanks for your open words... almost all thing are a journey, and the ones never embark on it because the trees do not always grow straight, will never reach the heights of success. Not macht erfinderisch! Necessity is the mother of invention! I see it this way: entering into the work on ESP32 has pointed out these necessities.... may be in the end we just have to be ralistic and say: it is not the tool or means for this application... and nothing is wrong with that. Not for nothing moved the neopixel support into the internals... as you yourself also worked on that success story.

  • @Gordon
    Gave it a try and came down to 25 hz instead of 14 hz. This is still too slow, at least for ESP32.
    Therefore I will go back to a special function outside the original Espruino code.
    I opened a github repository for special code about 2 years ago and will bring it back to life
    And maybe, I will spend some time on my makeFirmWare service, you never know

    My ideas to speed up shiftout are

    • split into 2 parts, first init and second shiftout data
    • + init could be done before the loop
    • + number of parameters for shifting is reduced
    • - init needs to be done before, and data has to be stored somewhere
    • instead of an sub array send start and length
    • use a memory address instead of iterate
    • + faster compared to iterate
    • - works for large arrays only (?)
    • one loop instead of jsvIterateCallback
  • That's a shame. I wonder how long the init part of shiftOut actually takes... My gut feeling is not long. Also jsvIterateCallback should be pretty speedy when it's given something that is a big flat array.

    Does ESP32 have the ability to access GPIO directly via an address in memory? I notice that on STM32, shiftOut is able to use jshGetPinAddress to get an address it can write to really quickly. We could maybe do the same for ESP32?

  • ESP32 supports I2S in parallel mode. This can be used to set multiple pins at a time.

    There is an example in ESP32 showcase (https://esp32.com/viewtopic.php?f=17&t=3­188) which I converted to be used in Espruino. It's running fine under Espruino, supports 24bit colors.
    Handing over address of buffer one time is all I've to do, no converting, no refresh.
    And last not least, CPU load is poor, less than 1 %
    With other words, I've a solution for my needs.

    Problem of I2S itself is poor documentation/support in ESP-IDF, therefore converting the example to Espruino was more or less a kind of copy/paste. With other words, my knowledge about I2S on ESP32 is poor ;-)

  • Yes, understood.

    While it's a shame we don't have a more general version, at least you have something that does what you want :)

  • ESP32 supports I2S in parallel mode.

    Sounds like rebirth of good-old 'Centronics' (printer) parallel interface... (IEEE 1284, 36 pins/wires) - just with more option then just nibble and byte... Interestingly on that interface was that slaves (printers) could be implemented without alu / processor... just plain timed TTL logic could do the job. Some early printers could not even interpret control characters, that's why there were so many extra lines to control line feed, page feed, etc. and in the beginning it was not bidirectional either and this added even more lines to provide signals such as out of paper. The pulse/clock stretch was done with 'delaying' the ack... even though it was 'parallel', in the 'bigger picture' it was kind-a async serial... great fun!

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

P3 RGB pixel panel

Posted by Avatar for Owen @Owen

Actions