performance tips for computations on an ESP32

Posted on
Page
of 2
/ 2
Next
  • Hello,

    are there any tips how I may speed up numeric computations (particularly basic integer arithmetics) on an ESP32 running Espruino?

    My current project basically mimics a cellular automaton modeling s.th. like thermal diffusion and visualizing the result on a Neopixel matrix display. Numerics is limited to basic integer arithmetics but involves a lot of Math.round/min/max calls. Colors and brightnesses are then looked up using tables built using Uint8ClampedArray.

    The program is running, but turns out to be surprisingly slow - especially when compared with a similar implementation for a Raspberry Pi Pico running Kaluma (and the latter has not yet been optimized in any way!) By the way: running the same code on an Espruino MDBT42Q gives a performance comparable to the ESP32 (which surprised me as I expected the ESP32 to run faster)

    Are there any tips what I could do to speed the program up?

    (if my description sounds to abstract, I could also share my code - approx. 120 lines)

    Thanks in advance for any help!

    With greetings from Germany,

    Andreas Rozek

  • for MDBT42Q code you may try to put "compiled" in the methods that are doing the math, see
    https://www.espruino.com/Compilation

    if it helps then you know how much this would help ESP32 too but this feature is not there (?) for ESP32 but maybe it wouldn't be that much hard to add it

    btw there is no "integer arithmetics" in javascript, everything is in double floating point done in software. However the "compiled" code may figure out you are using only integers and do integer math.

    as for raspberry pico, it has hand optimized floating point code in ROM
    https://www.quinapalus.com/qfplib-m0-full.html
    https://github.com/raspberrypi/pico-bootrom/blob/master/bootrom/mufplib.S
    maybe it helps there too.
    BTW, it would be interesting to try this qfplib library with nrf51/52 build too, it is GPLed
    https://www.quinapalus.com/qfplib-m3.html

  • indeed, I forgot about "compilation" when optimizing my code for the ESP32 (well, as you said, it's not supported anyway)

    While there is indeed no explicit integer arithmetics in JavaScript, there are still tricks to inform JavaScript about the preference of integer arithmetics (these are used by ASM.js, e.g.).

    And, perhaps, there is any other hidden (or less documented) "trick" to speed things up, e.g., by using single-precision FP rather than double-precision etc.

    Performance of the RasPi pico is, nevertheless, impressive! (particularly when compared with its price!)

  • indeed, I forgot about "compilation" when optimizing my code for the ESP32 (well, as you said, it's not supported anyway)

    still would be interesting to test with MDBT42Q how much it helps

    While there is indeed no explicit integer arithmetics in JavaScript, there are still tricks to inform JavaScript about the preference of integer arithmetics (these are used by ASM.js, e.g.).

    this may work in the "compiled" case but not in normal js interpreter (?)

    And, perhaps, there is any other hidden (or less documented) "trick" to speed things up, e.g., by using single-precision FP rather than double-precision etc.

    This would need different build with -DUSE_FLOATS
    https://github.com/espruino/Espruino/blob/master/src/jsutils.h#L280
    and then everything is single precision (but the main slowdown is probably the interpreter)

  • I briefly looked through the build instructions for Espruino under macOS - and it looked like too much effort for the limited time I have - thus, I'll have to stick with double precision floats.

    Using "compiled" failed with the message

    WARNING: Function marked with "compiled" uploaded in source form
    [ COMPILER MESSAGE ] : TypeError: Cannot read property 'type' of null
    

    which I do not understand right now

  • By the way: I just added some timing measurements to my code and found out that the MDBT42Q is almost twice as fast as the ESP32!

    However, an acceleration by the factor of 3 would still be desirable...

    Addendum: I also found a continuous slow-down while the program was running (regardless of the platform)

  • TypeError: Cannot read property 'type' of null

    Hard to tell without seeing the code, see "Caveats" and "Doesn't Work" on the bottom of Compile page.
    It is trying to detemine type of something which is null so perhaps some variable with undefined value?

  • I looked into the mentioned sections but did not find anything applicable.

    Just to stop being abstract: here is my code for the MDBT42Q

      let Neopixel = require("neopixel");
    
      let DisplaySize = 16*16;
      let Display     = new Uint8ClampedArray(DisplaySize*3);
    
      let PixelOffset = new Int16Array(256);
        for (let i = 0; i < 16; i++) {
          for (let j = 0; j < 16; j++) {
            let Index = i*16 + j;
            if (i % 2 === 0) {
              PixelOffset[Index] = Index*3;
            } else {
              PixelOffset[Index] = (i*16 + 15 - j)*3;
            }
          }
        }
    
      let normalized = new Uint8ClampedArray(16*16);
        for (let i = 0; i < 256; i++) {
          normalized[i] = (
            i === 0
            ? 0
            : Math.round(1 + 256/4*(i/256)*(i/256))
          );
        }
    
      let TemperatureMap = new Uint8ClampedArray(DisplaySize);
        for (let i = 0; i < 256; i++) {
          TemperatureMap[i] = 0;
        }
    
      let ColorMap = new Array(16);
        for (let i = 0; i < 16; i++) {
          ColorMap[i] = E.HSBtoRGB(i/16, 1, 1, true);
        }
    
    /**** processNextFrame ****/
    
      let HeatX     = Math.round(16*Math.random()); // where to insert new intensity
      let HeatCount = Math.round(12*Math.random());        // how long to heat there
    
      let normalizedRGB = new Uint8ClampedArray(3);    // used by "processNextFrame"
    
      function processNextFrame () { "compiled";
        let i,j,k,l;
        for (i = 15; i > 0; i--) {             // intensity diffusion for upper rows
          k = i*16;
          TemperatureMap[k] = Math.round((
            6*TemperatureMap[k] + 2*TemperatureMap[k-16] + TemperatureMap[k-15]
          )/9);
    
          for (j = 1; j < 15; j++) {
            k++;
            TemperatureMap[k] = Math.round((
              6*TemperatureMap[k] +
              TemperatureMap[k-17] + 2*TemperatureMap[k-16] + TemperatureMap[k-15]
            )/10);
          }
    
          k++;
          TemperatureMap[k] = Math.round((
            6*TemperatureMap[k] + TemperatureMap[k-17] + 2*TemperatureMap[k-16]
          )/9);
        }
    
      /**** heat around HeatX, cool elsewhere ****/
    
        HeatCount -= 1;
        if (HeatCount < 0) {
          HeatX     = Math.round(16*Math.random());
          HeatCount = Math.round(12*Math.random());
        }
    
        l = HeatX-4;
        for (i = 0 /*, l = HeatX-4*/; i < l; i++) {
          TemperatureMap[i] = TemperatureMap[i] - 8;            // exploits clamping
        }
          i = HeatX-4;
          i++; if (i >= 0) { TemperatureMap[i] = TemperatureMap[i] + 2; }    // dto.
          i++; if (i >= 0) { TemperatureMap[i] = TemperatureMap[i] + 5; }    // dto.
          i++; if (i >= 0) { TemperatureMap[i] = TemperatureMap[i] + 7; }    // dto.
          i++;               TemperatureMap[i] = TemperatureMap[i] + 8;      // dto.
          i++; if (i < 16) { TemperatureMap[i] = TemperatureMap[i] + 7; }    // dto.
          i++; if (i < 16) { TemperatureMap[i] = TemperatureMap[i] + 5; }    // dto.
          i++; if (i < 16) { TemperatureMap[i] = TemperatureMap[i] + 2; }    // dto.
          i++;
        for (; i < 16; i++) {
          TemperatureMap[i] = TemperatureMap[i] - 8;            // exploits clamping
        }
    
      /**** prepare display ****/
    
        let RGB /*, normalizedRGB = new Uint8ClampedArray(3)*/;
        let RowStart, RowEnd;
        for (i = 0; i < 16; i++) {                                       // row-wise
          RowStart = i*16; RowEnd = RowStart + 15;
          for (j = 0, k = RowStart; j < 16; j++, k++) {               // column-wise
            RGB         = ColorMap[j];
            Temperature = TemperatureMap[k];
              if (Temperature < 16) {
                Temperature = 0;
              } else {
                Temperature = Math.max(48,Temperature)/256;
              }
    
            normalizedRGB[0] = normalized[Math.round(RGB[1]*Temperature)];
            normalizedRGB[1] = normalized[Math.round(RGB[0]*Temperature)];
            normalizedRGB[2] = normalized[Math.round(RGB[2]*Temperature)];
    
            Display.set(normalizedRGB, PixelOffset[k]);
          }
        }
    
      /**** show display ****/
    
        Neopixel.write(D22,Display);
    
      /**** wait for next frame and proceed ****/
    
    let now = Date.now();
    if (Timestamp > 0) { print(now-Timestamp); }
    Timestamp = now;
    
        setTimeout(processNextFrame,0);
      }
    
    let Timestamp = 0;
      setTimeout(processNextFrame,100);       // give Espruino some time to start up
    

    it runs even without an attached Neopixel matrix display - but having one is far more impressive...

  • don't see anything either in the processNextFrame method

    let i,j,k,l;

    maybe you can try to assign values when declared so it can better guess the type

    also assigning Math.round and other frequently called methods to variables may speed it up

  • I've put compiler output here https://gist.github.com/fanoush/8d7f38ab2cab05676f5aad677ec89203

    @Gordon maybe you can look into it? there is ton of suspicious "Error\n at setType (/home/fanoush/EspruinoCompiler/src/infer.js:11:60)\n

    a bit edited source is here https://gist.github.com/fanoush/6fe1e02951113293a6959a7e31cfdad9

  • Thank you very much for your effort! I did not imagine that my question could cause so much trouble...

  • Just to let you know: binding Math.round to a variable rounded and calling rounded(...) rather than Math.round speeds my code up by approx. 100/1500 = approx. 7%! Additionally binding Math.max to max gave me a total of roughly 10% speedup(!)

  • well, the code looks simple. anyway, just tried the simple example from Compile page and there is tons of Error\n at setType errors too so these may be harmless. So the issue is probably the end TypeError: Cannot read property 'type' of null

    it looks like that one is caused by one of the for loop not having first section for (; i < 16; i++) after adding i=i I got NewExpression not implemented - that was my let RGB=new Uint8ClampedArray(3) and now I have Error: SequenceExpression is not implemented yet. maybe that is "xx,yy" in your for loops.

    EDIT: after removing , from let and for it builds
    EDIT2: let is ok with ,, just in for it is an issue

  • after fixing Temperature I get numbers like

    >
    714.75219726562
    716.52221679687
    715.4541015625
    716.36962890625
    717.98706054687
    

    so it is like 2x speedup? not that much.

  • Just to let you know:

    • thank you very much for your effort
    • I'm currently trying to update my source code accordingly, but the BT connection no longer works reliably: I can't even transfer the source any longer (and my original Espruino does not have enough memory...and my ESP32 does not support compilation...)

    I'll post my results here as soon as I have some....

  • well, that speedup would already be sufficient for me - I could live with that refresh rate

  • Which device did you use for testing? Both my MDBT42Q and my original Espruino run out of memory with compiled code (and minification does not help)...and my Espruino Pico hangs itself up after transmitting the code - Espruino does not like me today (perhaps, I should not have mentioned the RasPi pico?)

  • Oh, you're right, I used NRF52840. the final flat string binary with compiled code is 7860 bytes long. And the atob("xxx") string is 11 kB so it may momentarily need 20kB if loaded to ram. When uploaded to flash it could be better.

    It is true that compiled javascript is quite bloated. InlineC https://www.espruino.com/InlineC would be much smaller (and faster).

  • breaking news (although I can not believe them):

    uploaded to flash on an original Espruino gives approx. 195ms per frame (no typo!) for the compiled code.

    Let me see if this can be true.

  • nice, if you put Math. methods to local variables it could be faster, resolving those repeatedly may be one of the slower bits there, I did not do that

  • damn...it's not my day...

    it seemed to work once but now it does no longer.

    I'm currently working with the original Espruino:

    • after uploading the code into flash, the device hangs itself up
    • trying to reconnect does not report the board type, regardless of reset or power cycling
    • after reflashing the firmware (which works), connection over Web Serial works as intended - until I upload my source into flash again

    S.th. seems to be terribly wrong right now...

    Amendment: my code works fine - albeit slowly - even after uploading into flash as long as I do not compile - as soon as "compiled" is added, the Espruino hangs itself up, does no longer properly connect via Web Serial and I'll have to reflash the firmware in order to "unbrick" it.

    2nd amendment: Neopixel.write(B15,Display); seems to crash compiled code - for whatever reason. If I comment it out, I may upload to flash and the code will run fine (but, of course, it will be useless)

    3rd amendment: by "compiling" the actual computations only and moving everything else into an uncompiled function, I was able to avoid the hang ups and the code still runs fast enough - but Neopixel.write(B15,Display); has no effect any more (perhaps, because Display wasn't updated properly by the compiled function)

  • Ok, I have to give up for today.

    My impression is that "compilation" of functions has some important (and perhaps undocumented) limitations I do not know yet.

    @Gordon should we limit compiled functions to computations only? Can they access local variables in their lexical context? Or should we pass any such variables as invocation parameters? Can compiled functions handle Uint8ClampedArray properly? (i.e., incl. clamping?)

  • There is s.th. terribly going wrong with "compiled" code:

    I just restructured my source into several functions which I can individually "compile" or not - and I did not start the program automatically, giving me the chance to simply "reset" the board after a hang-up. Additionally, this allows me to invoke individual functions after flashing (or reset)

    But now the board even hangs itself up after calling an uncompiled function - without ever calling a compiled function before!

    Just for the records: the program runs fine (albeit slowly) without compilation!

    Amendment: meanwhile, I found a combination of optimizations (e.g., Math.round -> rounded) and compilation of a single, small function that does not hang up my Espruino and runs fast enough to live with. Taking into account how much time it took to find that (kind of) "solution" I have to stop here.

    While the performance increases of compilation are impressive, I still can't really recommend going that way as it seems completely unclear how one can get compiled code running - it may work or it may not, but a recipe cannot be given...

  • This is my final code (for an original Espruino, with a 16x16 Neopixel LED matrix wired to pin B15) - just for the case that you want to play with it:

      let Neopixel = require("neopixel");
    
      let rounded = Math.round;
      let clamped = E.clip;
      let max     = Math.max;
    
      let DisplaySize = 16*16;
      let Display     = new Uint8ClampedArray(DisplaySize*3);
    
      let PixelOffset = new Int16Array(256);
        for (let i = 0; i < 16; i++) {
          for (let j = 0; j < 16; j++) {
            let Index = i*16 + j;
            if (i % 2 === 0) {
              PixelOffset[Index] = Index*3;
            } else {
              PixelOffset[Index] = (i*16 + 15 - j)*3;
            }
          }
        }
    
      let normalized = new Uint8ClampedArray(16*16);
        for (let i = 0; i < 256; i++) {
          normalized[i] = (
            i === 0
            ? 0
            : rounded(1 + 256/4*(i/256)*(i/256))
          );
        }
    
      let TemperatureMap = new Uint8ClampedArray(DisplaySize);
        for (let i = 0; i < 256; i++) {
          TemperatureMap[i] = 0;
        }
    
      let ColorMap = new Array(16);
        for (let i = 0; i < 16; i++) {
          ColorMap[i] = E.HSBtoRGB(i/16, 1, 1, true);
        }
    
    /**** computeDiffusion ****/
    
      function computeDiffusion () {
        let i = 0, j = 0, k = 0;
        for (i = 15; i > 0; i--) {             // intensity diffusion for upper rows
          k = i*16;
          TemperatureMap[k] = rounded((
            6*TemperatureMap[k] + 2*TemperatureMap[k-16] + TemperatureMap[k-15]
          )/9);
    
          for (j = 1; j < 15; j++) {
            k++;
            TemperatureMap[k] = rounded((
              6*TemperatureMap[k] +
              TemperatureMap[k-17] + 2*TemperatureMap[k-16] + TemperatureMap[k-15]
            )/10);
          }
    
          k++;
          TemperatureMap[k] = rounded((
            6*TemperatureMap[k] + TemperatureMap[k-17] + 2*TemperatureMap[k-16]
          )/9);
        }
      }
    
    /**** computeHeating ****/
    
      let HeatX     = rounded(16*Math.random());         // where to insert new heat
      let HeatCount = rounded(12*Math.random());           // how long to heat there
    
      function computeHeating () {
        HeatCount -= 1;
        if (HeatCount < 0) {                    // heat around HeatX, cool elsewhere
          HeatX     = rounded(16*Math.random());
          HeatCount = rounded(12*Math.random());
        }
    
        let i, l = HeatX-4;
        for (i = 0/*, l = HeatX-4*/; i < l; i++) {
          TemperatureMap[i] = clamped(TemperatureMap[i] - 8, 0,255);
        }
          i = HeatX-4;
          i++; if (i >= 0) { TemperatureMap[i] = clamped(TemperatureMap[i] + 1, 0,255); }
          i++; if (i >= 0) { TemperatureMap[i] = clamped(TemperatureMap[i] + 3, 0,255); }
          i++; if (i >= 0) { TemperatureMap[i] = clamped(TemperatureMap[i] + 4, 0,255); }
          i++;               TemperatureMap[i] = clamped(TemperatureMap[i] + 5, 0,255);
          i++; if (i < 16) { TemperatureMap[i] = clamped(TemperatureMap[i] + 4, 0,255); }
          i++; if (i < 16) { TemperatureMap[i] = clamped(TemperatureMap[i] + 3, 0,255); }
          i++; if (i < 16) { TemperatureMap[i] = clamped(TemperatureMap[i] + 1, 0,255); }
          i++;
        for (i = i; i < 16; i++) {
          TemperatureMap[i] = clamped(TemperatureMap[i] - 8, 0,255);
        }
      }
    
    /**** prepare display ****/
    
      let normalizedRGB = new Uint8ClampedArray(3);
    
      function prepareDisplay () { "compiled";
        for (let i = 0; i < 16; i++) {                                   // row-wise
          let RowStart = i*16; let RowEnd = RowStart + 15;
          let k = RowStart;
          for (let j = 0/*, k = RowStart*/; j < 16; j++/*, k++*/) {   // column-wise
            let RGB         = ColorMap[j];
            let Temperature = TemperatureMap[k];
              if (Temperature < 16) {
                Temperature = 0;
              } else {
                Temperature = max(48,Temperature)/256;
              }
    
            normalizedRGB[0] = normalized[rounded(RGB[1]*Temperature)];
            normalizedRGB[1] = normalized[rounded(RGB[0]*Temperature)];
            normalizedRGB[2] = normalized[rounded(RGB[2]*Temperature)];
    
            Display.set(normalizedRGB, PixelOffset[k]);
    
            k++;
          }
        }
      }
    
    /**** handleNextFrame ****/
    
      function handleNextFrame () {
        computeDiffusion();
        computeHeating();
        prepareDisplay();
    
      /**** show display ****/
    
        Neopixel.write(B15,Display);
    
      /**** wait for next frame and proceed ****/
    
        setTimeout(handleNextFrame,10);
      }
    
    /**** start automatically ****/
    
      setTimeout(handleNextFrame,1000);       // give Espruino some time to start up
    
  • use fill(0) to init the TemperatureMap

    let TemperatureMap = new Uint8ClampedArray(DisplaySize).fill(0);

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

performance tips for computations on an ESP32

Posted by Avatar for Andreas_Rozek @Andreas_Rozek

Actions