-
• #2
for MDBT42Q code you may try to put "compiled" in the methods that are doing the math, see
https://www.espruino.com/Compilationif it helps then you know how much this would help ESP32 too but this feature is not there (?) for ESP32 but maybe it wouldn't be that much hard to add it
btw there is no "integer arithmetics" in javascript, everything is in double floating point done in software. However the "compiled" code may figure out you are using only integers and do integer math.
as for raspberry pico, it has hand optimized floating point code in ROM
https://www.quinapalus.com/qfplib-m0-full.html
https://github.com/raspberrypi/pico-bootrom/blob/master/bootrom/mufplib.S
maybe it helps there too.
BTW, it would be interesting to try this qfplib library with nrf51/52 build too, it is GPLed
https://www.quinapalus.com/qfplib-m3.html -
• #3
indeed, I forgot about "compilation" when optimizing my code for the ESP32 (well, as you said, it's not supported anyway)
While there is indeed no explicit integer arithmetics in JavaScript, there are still tricks to inform JavaScript about the preference of integer arithmetics (these are used by ASM.js, e.g.).
And, perhaps, there is any other hidden (or less documented) "trick" to speed things up, e.g., by using single-precision FP rather than double-precision etc.
Performance of the RasPi pico is, nevertheless, impressive! (particularly when compared with its price!)
-
• #4
indeed, I forgot about "compilation" when optimizing my code for the ESP32 (well, as you said, it's not supported anyway)
still would be interesting to test with MDBT42Q how much it helps
While there is indeed no explicit integer arithmetics in JavaScript, there are still tricks to inform JavaScript about the preference of integer arithmetics (these are used by ASM.js, e.g.).
this may work in the "compiled" case but not in normal js interpreter (?)
And, perhaps, there is any other hidden (or less documented) "trick" to speed things up, e.g., by using single-precision FP rather than double-precision etc.
This would need different build with -DUSE_FLOATS
https://github.com/espruino/Espruino/blob/master/src/jsutils.h#L280
and then everything is single precision (but the main slowdown is probably the interpreter) -
• #5
I briefly looked through the build instructions for Espruino under macOS - and it looked like too much effort for the limited time I have - thus, I'll have to stick with double precision floats.
Using "compiled" failed with the message
WARNING: Function marked with "compiled" uploaded in source form [ COMPILER MESSAGE ] : TypeError: Cannot read property 'type' of null
which I do not understand right now
-
• #6
By the way: I just added some timing measurements to my code and found out that the MDBT42Q is almost twice as fast as the ESP32!
However, an acceleration by the factor of 3 would still be desirable...
Addendum: I also found a continuous slow-down while the program was running (regardless of the platform)
-
• #7
TypeError: Cannot read property 'type' of null
Hard to tell without seeing the code, see "Caveats" and "Doesn't Work" on the bottom of Compile page.
It is trying to detemine type of something which is null so perhaps some variable with undefined value? -
• #8
I looked into the mentioned sections but did not find anything applicable.
Just to stop being abstract: here is my code for the MDBT42Q
let Neopixel = require("neopixel"); let DisplaySize = 16*16; let Display = new Uint8ClampedArray(DisplaySize*3); let PixelOffset = new Int16Array(256); for (let i = 0; i < 16; i++) { for (let j = 0; j < 16; j++) { let Index = i*16 + j; if (i % 2 === 0) { PixelOffset[Index] = Index*3; } else { PixelOffset[Index] = (i*16 + 15 - j)*3; } } } let normalized = new Uint8ClampedArray(16*16); for (let i = 0; i < 256; i++) { normalized[i] = ( i === 0 ? 0 : Math.round(1 + 256/4*(i/256)*(i/256)) ); } let TemperatureMap = new Uint8ClampedArray(DisplaySize); for (let i = 0; i < 256; i++) { TemperatureMap[i] = 0; } let ColorMap = new Array(16); for (let i = 0; i < 16; i++) { ColorMap[i] = E.HSBtoRGB(i/16, 1, 1, true); } /**** processNextFrame ****/ let HeatX = Math.round(16*Math.random()); // where to insert new intensity let HeatCount = Math.round(12*Math.random()); // how long to heat there let normalizedRGB = new Uint8ClampedArray(3); // used by "processNextFrame" function processNextFrame () { "compiled"; let i,j,k,l; for (i = 15; i > 0; i--) { // intensity diffusion for upper rows k = i*16; TemperatureMap[k] = Math.round(( 6*TemperatureMap[k] + 2*TemperatureMap[k-16] + TemperatureMap[k-15] )/9); for (j = 1; j < 15; j++) { k++; TemperatureMap[k] = Math.round(( 6*TemperatureMap[k] + TemperatureMap[k-17] + 2*TemperatureMap[k-16] + TemperatureMap[k-15] )/10); } k++; TemperatureMap[k] = Math.round(( 6*TemperatureMap[k] + TemperatureMap[k-17] + 2*TemperatureMap[k-16] )/9); } /**** heat around HeatX, cool elsewhere ****/ HeatCount -= 1; if (HeatCount < 0) { HeatX = Math.round(16*Math.random()); HeatCount = Math.round(12*Math.random()); } l = HeatX-4; for (i = 0 /*, l = HeatX-4*/; i < l; i++) { TemperatureMap[i] = TemperatureMap[i] - 8; // exploits clamping } i = HeatX-4; i++; if (i >= 0) { TemperatureMap[i] = TemperatureMap[i] + 2; } // dto. i++; if (i >= 0) { TemperatureMap[i] = TemperatureMap[i] + 5; } // dto. i++; if (i >= 0) { TemperatureMap[i] = TemperatureMap[i] + 7; } // dto. i++; TemperatureMap[i] = TemperatureMap[i] + 8; // dto. i++; if (i < 16) { TemperatureMap[i] = TemperatureMap[i] + 7; } // dto. i++; if (i < 16) { TemperatureMap[i] = TemperatureMap[i] + 5; } // dto. i++; if (i < 16) { TemperatureMap[i] = TemperatureMap[i] + 2; } // dto. i++; for (; i < 16; i++) { TemperatureMap[i] = TemperatureMap[i] - 8; // exploits clamping } /**** prepare display ****/ let RGB /*, normalizedRGB = new Uint8ClampedArray(3)*/; let RowStart, RowEnd; for (i = 0; i < 16; i++) { // row-wise RowStart = i*16; RowEnd = RowStart + 15; for (j = 0, k = RowStart; j < 16; j++, k++) { // column-wise RGB = ColorMap[j]; Temperature = TemperatureMap[k]; if (Temperature < 16) { Temperature = 0; } else { Temperature = Math.max(48,Temperature)/256; } normalizedRGB[0] = normalized[Math.round(RGB[1]*Temperature)]; normalizedRGB[1] = normalized[Math.round(RGB[0]*Temperature)]; normalizedRGB[2] = normalized[Math.round(RGB[2]*Temperature)]; Display.set(normalizedRGB, PixelOffset[k]); } } /**** show display ****/ Neopixel.write(D22,Display); /**** wait for next frame and proceed ****/ let now = Date.now(); if (Timestamp > 0) { print(now-Timestamp); } Timestamp = now; setTimeout(processNextFrame,0); } let Timestamp = 0; setTimeout(processNextFrame,100); // give Espruino some time to start up
it runs even without an attached Neopixel matrix display - but having one is far more impressive...
-
• #9
don't see anything either in the
processNextFrame
methodlet i,j,k,l;
maybe you can try to assign values when declared so it can better guess the type
also assigning Math.round and other frequently called methods to variables may speed it up
-
• #10
I've put compiler output here https://gist.github.com/fanoush/8d7f38ab2cab05676f5aad677ec89203
@Gordon maybe you can look into it? there is ton of suspicious
"Error\n at setType (/home/fanoush/EspruinoCompiler/src/infer.js:11:60)\n
a bit edited source is here https://gist.github.com/fanoush/6fe1e02951113293a6959a7e31cfdad9
-
• #11
Thank you very much for your effort! I did not imagine that my question could cause so much trouble...
-
• #12
Just to let you know: binding
Math.round
to a variablerounded
and callingrounded(...)
rather thanMath.round
speeds my code up by approx. 100/1500 = approx. 7%! Additionally bindingMath.max
tomax
gave me a total of roughly 10% speedup(!) -
• #13
well, the code looks simple. anyway, just tried the simple example from Compile page and there is tons of
Error\n at setType
errors too so these may be harmless. So the issue is probably the endTypeError: Cannot read property 'type' of null
it looks like that one is caused by one of the for loop not having first section
for (; i < 16; i++)
after addingi=i
I got NewExpression not implemented - that was mylet RGB=new Uint8ClampedArray(3)
and now I have Error: SequenceExpression is not implemented yet. maybe that is "xx,yy" in yourfor
loops.EDIT: after removing
,
fromlet
andfor
it builds
EDIT2:let
is ok with,
, just infor
it is an issue -
• #14
after fixing Temperature I get numbers like
> 714.75219726562 716.52221679687 715.4541015625 716.36962890625 717.98706054687
so it is like 2x speedup? not that much.
-
• #15
Just to let you know:
- thank you very much for your effort
- I'm currently trying to update my source code accordingly, but the BT connection no longer works reliably: I can't even transfer the source any longer (and my original Espruino does not have enough memory...and my ESP32 does not support compilation...)
I'll post my results here as soon as I have some....
- thank you very much for your effort
-
• #16
well, that speedup would already be sufficient for me - I could live with that refresh rate
-
• #17
Which device did you use for testing? Both my MDBT42Q and my original Espruino run out of memory with compiled code (and minification does not help)...and my Espruino Pico hangs itself up after transmitting the code - Espruino does not like me today (perhaps, I should not have mentioned the RasPi pico?)
-
• #18
Oh, you're right, I used NRF52840. the final flat string binary with compiled code is 7860 bytes long. And the atob("xxx") string is 11 kB so it may momentarily need 20kB if loaded to ram. When uploaded to flash it could be better.
It is true that compiled javascript is quite bloated. InlineC https://www.espruino.com/InlineC would be much smaller (and faster).
-
• #19
breaking news (although I can not believe them):
uploaded to flash on an original Espruino gives approx. 195ms per frame (no typo!) for the compiled code.
Let me see if this can be true.
-
• #20
nice, if you put
Math.
methods to local variables it could be faster, resolving those repeatedly may be one of the slower bits there, I did not do that -
• #21
damn...it's not my day...
it seemed to work once but now it does no longer.
I'm currently working with the original Espruino:
- after uploading the code into flash, the device hangs itself up
- trying to reconnect does not report the board type, regardless of reset or power cycling
- after reflashing the firmware (which works), connection over Web Serial works as intended - until I upload my source into flash again
S.th. seems to be terribly wrong right now...
Amendment: my code works fine - albeit slowly - even after uploading into flash as long as I do not compile - as soon as "compiled" is added, the Espruino hangs itself up, does no longer properly connect via Web Serial and I'll have to reflash the firmware in order to "unbrick" it.
2nd amendment:
Neopixel.write(B15,Display);
seems to crash compiled code - for whatever reason. If I comment it out, I may upload to flash and the code will run fine (but, of course, it will be useless)3rd amendment: by "compiling" the actual computations only and moving everything else into an uncompiled function, I was able to avoid the hang ups and the code still runs fast enough - but
Neopixel.write(B15,Display);
has no effect any more (perhaps, becauseDisplay
wasn't updated properly by the compiled function) - after uploading the code into flash, the device hangs itself up
-
• #22
Ok, I have to give up for today.
My impression is that "compilation" of functions has some important (and perhaps undocumented) limitations I do not know yet.
@Gordon should we limit compiled functions to computations only? Can they access local variables in their lexical context? Or should we pass any such variables as invocation parameters? Can compiled functions handle
Uint8ClampedArray
properly? (i.e., incl. clamping?) -
• #23
There is s.th. terribly going wrong with "compiled" code:
I just restructured my source into several functions which I can individually "compile" or not - and I did not start the program automatically, giving me the chance to simply "reset" the board after a hang-up. Additionally, this allows me to invoke individual functions after flashing (or reset)
But now the board even hangs itself up after calling an uncompiled function - without ever calling a compiled function before!
Just for the records: the program runs fine (albeit slowly) without compilation!
Amendment: meanwhile, I found a combination of optimizations (e.g.,
Math.round
->rounded
) and compilation of a single, small function that does not hang up my Espruino and runs fast enough to live with. Taking into account how much time it took to find that (kind of) "solution" I have to stop here.While the performance increases of compilation are impressive, I still can't really recommend going that way as it seems completely unclear how one can get compiled code running - it may work or it may not, but a recipe cannot be given...
-
• #24
This is my final code (for an original Espruino, with a 16x16 Neopixel LED matrix wired to pin B15) - just for the case that you want to play with it:
let Neopixel = require("neopixel"); let rounded = Math.round; let clamped = E.clip; let max = Math.max; let DisplaySize = 16*16; let Display = new Uint8ClampedArray(DisplaySize*3); let PixelOffset = new Int16Array(256); for (let i = 0; i < 16; i++) { for (let j = 0; j < 16; j++) { let Index = i*16 + j; if (i % 2 === 0) { PixelOffset[Index] = Index*3; } else { PixelOffset[Index] = (i*16 + 15 - j)*3; } } } let normalized = new Uint8ClampedArray(16*16); for (let i = 0; i < 256; i++) { normalized[i] = ( i === 0 ? 0 : rounded(1 + 256/4*(i/256)*(i/256)) ); } let TemperatureMap = new Uint8ClampedArray(DisplaySize); for (let i = 0; i < 256; i++) { TemperatureMap[i] = 0; } let ColorMap = new Array(16); for (let i = 0; i < 16; i++) { ColorMap[i] = E.HSBtoRGB(i/16, 1, 1, true); } /**** computeDiffusion ****/ function computeDiffusion () { let i = 0, j = 0, k = 0; for (i = 15; i > 0; i--) { // intensity diffusion for upper rows k = i*16; TemperatureMap[k] = rounded(( 6*TemperatureMap[k] + 2*TemperatureMap[k-16] + TemperatureMap[k-15] )/9); for (j = 1; j < 15; j++) { k++; TemperatureMap[k] = rounded(( 6*TemperatureMap[k] + TemperatureMap[k-17] + 2*TemperatureMap[k-16] + TemperatureMap[k-15] )/10); } k++; TemperatureMap[k] = rounded(( 6*TemperatureMap[k] + TemperatureMap[k-17] + 2*TemperatureMap[k-16] )/9); } } /**** computeHeating ****/ let HeatX = rounded(16*Math.random()); // where to insert new heat let HeatCount = rounded(12*Math.random()); // how long to heat there function computeHeating () { HeatCount -= 1; if (HeatCount < 0) { // heat around HeatX, cool elsewhere HeatX = rounded(16*Math.random()); HeatCount = rounded(12*Math.random()); } let i, l = HeatX-4; for (i = 0/*, l = HeatX-4*/; i < l; i++) { TemperatureMap[i] = clamped(TemperatureMap[i] - 8, 0,255); } i = HeatX-4; i++; if (i >= 0) { TemperatureMap[i] = clamped(TemperatureMap[i] + 1, 0,255); } i++; if (i >= 0) { TemperatureMap[i] = clamped(TemperatureMap[i] + 3, 0,255); } i++; if (i >= 0) { TemperatureMap[i] = clamped(TemperatureMap[i] + 4, 0,255); } i++; TemperatureMap[i] = clamped(TemperatureMap[i] + 5, 0,255); i++; if (i < 16) { TemperatureMap[i] = clamped(TemperatureMap[i] + 4, 0,255); } i++; if (i < 16) { TemperatureMap[i] = clamped(TemperatureMap[i] + 3, 0,255); } i++; if (i < 16) { TemperatureMap[i] = clamped(TemperatureMap[i] + 1, 0,255); } i++; for (i = i; i < 16; i++) { TemperatureMap[i] = clamped(TemperatureMap[i] - 8, 0,255); } } /**** prepare display ****/ let normalizedRGB = new Uint8ClampedArray(3); function prepareDisplay () { "compiled"; for (let i = 0; i < 16; i++) { // row-wise let RowStart = i*16; let RowEnd = RowStart + 15; let k = RowStart; for (let j = 0/*, k = RowStart*/; j < 16; j++/*, k++*/) { // column-wise let RGB = ColorMap[j]; let Temperature = TemperatureMap[k]; if (Temperature < 16) { Temperature = 0; } else { Temperature = max(48,Temperature)/256; } normalizedRGB[0] = normalized[rounded(RGB[1]*Temperature)]; normalizedRGB[1] = normalized[rounded(RGB[0]*Temperature)]; normalizedRGB[2] = normalized[rounded(RGB[2]*Temperature)]; Display.set(normalizedRGB, PixelOffset[k]); k++; } } } /**** handleNextFrame ****/ function handleNextFrame () { computeDiffusion(); computeHeating(); prepareDisplay(); /**** show display ****/ Neopixel.write(B15,Display); /**** wait for next frame and proceed ****/ setTimeout(handleNextFrame,10); } /**** start automatically ****/ setTimeout(handleNextFrame,1000); // give Espruino some time to start up
-
• #25
use fill(0) to init the TemperatureMap
let TemperatureMap = new Uint8ClampedArray(DisplaySize).fill(0);
Hello,
are there any tips how I may speed up numeric computations (particularly basic integer arithmetics) on an ESP32 running Espruino?
My current project basically mimics a cellular automaton modeling s.th. like thermal diffusion and visualizing the result on a Neopixel matrix display. Numerics is limited to basic integer arithmetics but involves a lot of Math.round/min/max calls. Colors and brightnesses are then looked up using tables built using Uint8ClampedArray.
The program is running, but turns out to be surprisingly slow - especially when compared with a similar implementation for a Raspberry Pi Pico running Kaluma (and the latter has not yet been optimized in any way!) By the way: running the same code on an Espruino MDBT42Q gives a performance comparable to the ESP32 (which surprised me as I expected the ESP32 to run faster)
Are there any tips what I could do to speed the program up?
(if my description sounds to abstract, I could also share my code - approx. 120 lines)
Thanks in advance for any help!
With greetings from Germany,
Andreas Rozek