performance tips for computations on an ESP32

Posted on
Page
of 2
Prev
/ 2
  • What about writing a own library for this?

    https://github.com/espruino/Espruino/blo­b/master/libs/README.md

  • use fill(0) to init the TemperatureMap

    let TemperatureMap = new Uint8ClampedArray(DisplaySize).fill(0);

    Indeed, I should do so next time - but it won't make any difference with respect to performance and stability.

  • What about writing a own library for this?

    https://github.com/espruino/Espruino/blo­b/master/libs/README.md

    well, I've chosen Espruino in order to avoid having to code in C/C++!

  • So check this way to compile the code

    http://forum.espruino.com/comments/14787­576/

  • It is sane for such javascript to work with the compiler, that is precisely what it is there for. So looks like there are some bugs.

    So only that prepareDisplay one that have "compiled" in your last listing works? enabling any other breaks it?

    As for speed I would avoid accessing global variables so much so for anything inside for loop I would cache global reference to local variable but I am not sure, will try some tests. the compiler is pretty simple, there are basically no optimizations. Still, it should work also as is.

    BTW you cannot reuse output of the compiler between nrf52 and stm32 but I guess you did not do that ( = pull saved code from MDBT42Q storage and upload to original Espruino)

  • So check this way to compile the code

    http://forum.espruino.com/comments/14787­576/

    it is tricky to access javascript globals from inside InlineC and call methods on them

  • I always send my code directly to the attached device - thus, I never touch (nor even see) the compiled code.

    And, indeed, trying to compile computeDiffusion or computeHeating breaks everything - sometimes even immediately after uploading (although I don't really start my code - except for those initialization lines)

    This sounds as if s.th. gets overwritten during upload.

  • In the mean time, I got computeHeating to compile as well by not directly assigning to TemperatureMap, but first to an auxiliary variable aux and then TemperatureMap[i] = aux

    I tried the same trick with computeDiffusion but failed. And compiling computeHeating did not have any noticeable effect on performance - thus, I reverted my changes again.

    Amendment: I tried over two dozens of variants for computeDiffusion - even silly ones like byte lookups from a Uint8ClampedArray with the values 0...255 - but all of them failed.

    My current impression is that success is nothing but good luck and depends on side effects of compilation (such as overwriting memory areas because of invalid length calculations or similar)

    Amendment #2: wow, indeed, success is pure luck! After all my experiments I tried to upload my working example again - and failed because I had also changed Math.random to random in the mean time. After reverting that change, the example started working again.

    Which means: simply adding let random = Math.random and replacing all (four) calls of Math.random by random broke a working program!

    Hopefully, this information may help Gordon et al. finding the reason for this weird compiler bug!

    Technical details:

    • I'm using an original Espruino (Rev 1.3b) with an HC-05 attached and running Espruino 2v10
    • the board is connected by USB, programming is done using the Web IDE configured without minification, without mangling and without pretokenization (because I once thought, that could cause problems as well)
    • my Neopixel 16x16 matrix is attached to pin B15
    • I can send you both versions of my code - the working and the failing one
  • for me it does not crash, however I am turning neopixel off as I don't have it. with your code I started with

    >handleNextFrame()
    412.4755859375 14.98413085937 385.1318359375
    

    and ended with

    >handleNextFrame()
    98.876953125 5.615234375 256.04248046875
    

    cachin globals to local variables give speedup and also the native code produced is smaller,
    here is code , I don't know what values it computes, it does not crash and can be timed repeatedly

    //let Neopixel = require("neopixel");
      let rounded = Math.round;
      let clamped = E.clip;
      let max     = Math.max;
      let DisplaySize = 16*16;
      let Display     = new Uint8ClampedArray(DisplaySize*3);
      let PixelOffset = new Int16Array(256);
        for (let i = 0; i < 16; i++) {
          for (let j = 0; j < 16; j++) {
            let Index = i*16 + j;
            if (i % 2 === 0) {
              PixelOffset[Index] = Index*3;
            } else {
              PixelOffset[Index] = (i*16 + 15 - j)*3;
            }
          }
        }
      let normalized = new Uint8ClampedArray(16*16);
        for (let i = 0; i < 256; i++) {
          normalized[i] = (
            i === 0
            ? 0
            : rounded(1 + 256/4*(i/256)*(i/256))
          );
        }
      let TemperatureMap = new Uint8ClampedArray(DisplaySize);
        for (let i = 0; i < 256; i++) {
          TemperatureMap[i] = 0;
        }
      let ColorMap = new Array(16);
        for (let i = 0; i < 16; i++) {
          ColorMap[i] = E.HSBtoRGB(i/16, 1, 1, true);
        }
    /**** computeDiffusion ****/
      function computeDiffusion () { "compiled";
        let rounded = Math.round;
        let tmap=TemperatureMap;
        let i = 0, j = 0, k = 0;
        for (i = 15; i > 0; i--) {             // intensity diffusion for upper rows
          k = i*16;
          tmap[k] = rounded((
            6*tmap[k] + 2*tmap[k-16] + tmap[k-15]
          )/9);
          for (j = 1; j < 15; j++) {
            k++;
            tmap[k] = rounded((
              6*tmap[k] + tmap[k-17] + 2*tmap[k-16] + tmap[k-15]
            )/10);
          }
          k++;
          tmap[k] = rounded((
            6*tmap[k] + tmap[k-17] + 2*tmap[k-16]
          )/9);
        }
      }
    /**** computeHeating ****/
      function computeHeating () { "compiled";
        let random=Math.random;
        let clamped = E.clip;
        let rounded = Math.round;
        let HeatX     = rounded(16*random());         // where to insert new heat
        let HeatCount = rounded(12*random());           // how long to heat there
        let tmap=TemperatureMap;
          HeatCount -= 1;
        if (HeatCount < 0) {                    // heat around HeatX, cool elsewhere
          HeatX     = rounded(16*random());
          HeatCount = rounded(12*random());
        }
        let i, l = HeatX-4;
        for (i = 0/*, l = HeatX-4*/; i < l; i++) {
          tmap[i] = clamped(tmap[i] - 8, 0,255);
        }
          i = HeatX-4;
          i++; if (i >= 0) { tmap[i] = clamped(tmap[i] + 1, 0,255); }
          i++; if (i >= 0) { tmap[i] = clamped(tmap[i] + 3, 0,255); }
          i++; if (i >= 0) { tmap[i] = clamped(tmap[i] + 4, 0,255); }
          i++;               tmap[i] = clamped(tmap[i] + 5, 0,255);
          i++; if (i < 16) { tmap[i] = clamped(tmap[i] + 4, 0,255); }
          i++; if (i < 16) { tmap[i] = clamped(tmap[i] + 3, 0,255); }
          i++; if (i < 16) { tmap[i] = clamped(tmap[i] + 1, 0,255); }
          i++;
        for (i = i; i < 16; i++) {
          tmap[i] = clamped(tmap[i] - 8, 0,255);
        }
      }
    /**** prepare display ****/
      let normalizedRGB = new Uint8ClampedArray(3);
      function prepareDisplay () { "compiled";
        let nRGB=normalizedRGB;
        let n=normalized;
        let tmap=TemperatureMap;
        let colormap=ColorMap;
        let rounded = Math.round;
        let max     = Math.max;
        for (let i = 0; i < 16; i++) {                                   // row-wise
          let RowStart = i*16; let RowEnd = RowStart + 15;
          let k = RowStart;
          for (let j = 0/*, k = RowStart*/; j < 16; j++/*, k++*/) {   // column-wise
            let RGB         = colormap[j];
            let Temperature = tmap[k];
              if (Temperature < 16) {
                Temperature = 0;
              } else {
                Temperature = max(48,Temperature)/256;
              }
            nRGB[0] = n[rounded(RGB[1]*Temperature)];
            nRGB[1] = n[rounded(RGB[0]*Temperature)];
            nRGB[2] = n[rounded(RGB[2]*Temperature)];
            Display.set(nRGB, PixelOffset[k]);
            k++;
          }
        }
      }
    /**** handleNextFrame ****/
      function handleNextFrame () {
     let d1,d2,d3,d = Date();
     computeDiffusion();d1=Date().ms-d.ms;d=D­ate();
      computeHeating();d2=Date().ms-d.ms;d=Dat­e();
      prepareDisplay();d3=Date().ms-d.ms;d=Dat­e();
      print(d1,d2,d3);
      /**** show display ****/
        //Neopixel.write(B15,Display);
      /**** wait for next frame and proceed ****/
    //    setTimeout(handleNextFrame,10);
      }
    /**** start automatically ****/
    //  setTimeout(handleNextFrame,1000);   
    
  • which device are you using? your code still crashes for me even without Neopixel-specific calls (which you may - and perhaps should - leave enabled as they do not expect any feedback but, effectively, just send pulses to pin B15)

    Amendment: moving outer variables into the three main functions (as you suggested) also already hangs up my code.

    Personally, I'd recommend to "diff" the code generated for the working and the non-working versions of my source for the original Espruino and look at the changes as s.th. really weird must be going on...

  • It is nrf52840 dongle, enabled neopixel in the build and the source and it still works

    let run=true;
      function handleNextFrame () {
     let d1,d2,d3,d = Date();
     computeDiffusion();d1=Date().ms-d.ms;d=D­ate();
      computeHeating();d2=Date().ms-d.ms;d=Dat­e();
      prepareDisplay();d3=Date().ms-d.ms;d=Dat­e();
      print(d1,d2,d3);
      /**** show display ****/
        Neopixel.write(D11,Display);
      /**** wait for next frame and proceed ****/
        if (run) setTimeout(handleNextFrame,10);
      }
    /**** start automatically ****/
      setTimeout(handleNextFrame,1000);   
    

    output

      __|___ ___ ___ _ _|_|___ ___
    |  __|_ -| . |  _| | | |   | . |
    |____|___|  _|_| |___|_|_|_|___|
             |_| espruino.com
     2v10.215 (c) 2021 G.Williams
    Espruino is Open Source. Our work is supported
    only by sales of official boards and donations:
    http://espruino.com/Donate
    >
    98.876953125 4.97436523437 246.00219726562
    99.18212890625 4.69970703125 245.45288085937
    99.21264648437 4.97436523437 245.94116210937
    99.39575195312 5.09643554687 246.12426757812
    99.27368164062 5.18798828125 246.03271484375
    99.42626953125 5.126953125 245.7275390625
    99.67041015625 4.66918945312 246.00219726562
    99.57885742187 4.73022460937 245.69702148437
    99.51782226562 4.85229492187 246.7041015625
    99.30419921875 4.82177734375 245.33081054687
    99.76196289062 4.66918945312 246.42944335937
    99.79248046875 4.66918945312 246.2158203125
    99.70092773437 4.94384765625 247.16186523437
    99.63989257812 4.54711914062 246.64306640625
    99.7314453125 4.97436523437 246.39892578125
    99.853515625 4.7607421875 246.09375
    99.609375 4.85229492187 245.57495117187
    99.82299804687 4.91333007812 246.06323242187
    100.03662109375 5.06591796875 245.45288085937
    100.341796875 4.66918945312 246.61254882812
    100.00610351562 4.73022460937 246.18530273437
    100.03662109375 4.79125976562 246.36840820312
    100.12817382812 4.60815429687 245.33081054687
    100.31127929687 4.8828125 245.7275390625
    100.2197265625 5.06591796875 245.94116210937
    100.28076171875 4.85229492187 246.79565429687
    100.28076171875 4.91333007812 246.15478515625
    100.37231445312 4.85229492187 246.18530273437
    100.4638671875 5.09643554687 246.00219726562
    100.61645507812 5.03540039062 245.849609375
    100.55541992187 5.15747070312 245.60546875
    100.79956054687 4.91333007812 246.06323242187
    100.64697265625 5.126953125 245.78857421875
    100.61645507812 4.638671875 245.26977539062
    100.64697265625 4.66918945312 245.63598632812
    100.52490234375 5.09643554687 246.49047851562
    101.07421875 4.97436523437 247.22290039062
    100.4638671875 4.638671875 245.94116210937
    100.76904296875 5.03540039062 246.97875976562
    100.64697265625 5.27954101562 246.24633789062
    >run=false
    =false
    100.64697265625 5.06591796875 245.39184570312
    > 
    
  • ...but the original Espruino is an STM32F103RCT6 - thus, the compiler bug may be controler-specific

  • I tried that too, changed top of https://github.com/gfwilliams/EspruinoCo­mpiler/blob/master/src/utils.js to compile cortex M3 for my board and result looks the same. can try on nrf52832 with 64K RAM but don't have original espruino board (with 48K RAM?) or any other cortex M3 board with espruino, also don't have neopixels

  • I just wanted to try on my MDBT42Q, but I'm afraid, I burned it in the hurry...thus, I cannot confirm your observations right now

  • BTW, with cached globals like that even not compiled is like 2x faster than before, just changed "compiled" strings and

    384.521484375 16.87622070312 866.69921875
    385.55908203125 15.80810546875 869.32373046875
    385.34545898437 15.31982421875 868.83544921875
    385.498046875 15.869140625 868.62182617187
    384.42993164062 16.14379882812 868.37768554687
    386.56616210937 15.07568359375 870.02563476562
    385.31494140625 15.28930664062 869.53735351562
    
  • This is indeed an interesting outcome (although I had to learn that variables declared outside of functions are called "globals" here)

    Thank you very much for your effort!

  • variables declared outside of functions are called "globals" here

    sorry, that's probably wrong, I mean outer scope , outside of native method, in this case it is global scope

    >global.computeDiffusion
    =function () { ... }
    >global.Display
    =new Uint8ClampedArray(768)
    > 
    

    https://www.espruino.com/Compilation#per­formance-notes
    If you access global variables ...

    EDIT: also there is generic performance page, it is there too
    https://www.espruino.com/Performance#som­e-variable-lookups-are-faster-than-other­s

  • don't worry - perhaps, I've just been too narrow-minded here...

  • Hi, Sorry for the slow replies here - I've had my head down trying to get all the new Bangles shipped, and I likely won't have much time during the next week or two until they're all out.

    When things calm down I'll try and look into the compiler bugs though. It is definitely only a subset of JS that is compiled, but it should be pretty easy to extend it to sort out some of the issues you are seeing.

  • Hello Gordon,

    I assumed s.th. like that and did not expect you to answer too soon. Just take the time to need to ship all Bangles (I'm waiting for two of them myself), then take a breath or two, so that your family does not forget how you look like and then, perhaps, look at the bugs we found.

    With greetings from Germany,

    Andreas Rozek

  • Just looked at the code, and I think there are still options for improvement: All the clamping and rounding for TemperatureMap values can be omitted:

    >let foo = new Uint8ClampedArray(3);
    =new Uint8ClampedArray(3)
    >foo[0] = -0.5; foo[1] = 1.5; foo[2] = 255.5;
    =undefined
    >foo
    =new Uint8ClampedArray([0, 1, 255])
    

    One day I'll test this code, I think the effect will be nice.

  • Well,

    that's indeed an interesting finding, but I doubt that it would have a significant effect on performance:

    • computeDiffusion does a lot of computations and rounds in the end only - replacing it by + 0.5 and relying on internal truncation and clamping would probably not be faster but less intuitive. However, rounding (of small values) is important for the final display any may therefore not just be omitted completely
    • computeHeating does not have any noticeable effect on the overall speed (it deals with 16 pixels only, not 256)
    • prepareDisplay rounds indices into an array only - and this cannot be omitted as you may easily test yourself

      var Test = [0,1];
      print(Test[0.5]);
      

    outputs undefined rather than 0 or 1.

    Anyway, thanks for your effort!

  • Well, optimizing for speed is often bad for readability. I wouldn't add "+0.5" to the code, just to save the time needed for this extra operation.
    And yes, numbers in square brackets are not rounded to an integer. And we all love code like this:

    >foo={"0.5": "bar", "1": "foo"};
    >foo[0.5];
    ="bar"
    >foo[Math.round(1.5)];
    ="foo"
    

    Since prepareDisplay is one of the slowest functions: ColorMap has only 8 different values, Temperature has 208 different values - so it might be feasible to pre-compute things and avoid all division and multiplication operations with Temperature in this function - if there's enough RAM available on the ESP32. Anyway, that's much harder than just removing rounding and clamping for TemperatureMap values. Removing 262 function calls will result in a measurable improvement, even for the compiled version.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

performance tips for computations on an ESP32

Posted by Avatar for Andreas_Rozek @Andreas_Rozek

Actions