-
Confirmed, uploading to RAM in the IDE now works.
I repeated the tests with Synthwave:
Storage + No pretokenize
= OK
Storage + Pretokenize
= Crash and reboot. Trying to run code directly in storage?
RAM + Pretokenize
= No crash but no rendering
RAM + No pretokenize
= No crash but no rendering
Flash + No pretokenize
= OK
I'll do some more testing later to figure out why it doesn't render anything when running from RAM. -
-
There's no Synthwave in the main banglejs.com/apps page - so I assume it's in your repo?
The PullRequest for it (together with Warpdrive) is still open (https://github.com/espruino/BangleApps/pull/3156). My fork of BangleApps is only missing the last 3 commits, which don't seem to be related. Besides, I was testing on https://www.espruino.com/ide, not my own fork of the IDE.
-
While it sure isn't as trivial an optimization as it seemed to me at first, the speedup (and space savings?) from your benchmarks sound great!
I've updated to the latest firmware. When I try to reinstall all the apps I get the errors in the attachment. The first time I tried, it got stuck updating the Scheduler app. I manually uninstalled/installed it, then tried again. Now it doesn't get stuck but it outputs those errors.
In the WebIDE, if I enable pretokenization on Synthwave and try to upload to RAM I get the following error:
You have more open brackets than close brackets. Please see the hints in the Editor window.
Apparently it's trying to reformat the pretokenized code and breaks.If I don't enable pretokenization, Synthwave uploads to RAM successfully, but the C++ code no longer works, I haven't looked into what broke yet. If I enable pretokenization but upload to storage instead of RAM, it crashes. It only works if I disable pretokenization and upload to storage.
Enabling Esprima minification results in "Error: Unreachable point. logically broken.", but the code uploads anyway.
Edit:
The following works with pretokenization in Storage but not in RAM.let c = E.compiledC(` // int boop(int) int boop(int d) { return d * 42; } `); print('boop:', c.boop(3));
-
-
-
-
-
-
I'm actually not trying to do anything specific at the moment, I'm just poking to see what works/doesn't and what's a good idea or not. From there I'll see what I do about this engine I've been making.
I like being able to share any resulting games or watchfaces I make on the repo in a way that people don't have to compile things on their own to see it. I didn't know that having inlinedC would result in extra load to the compiler server, I just assumed there was a cache somewhere since there wouldn't be much point in compiling the same code again and again.
I had also assumed
atob('string literal')
was being pretokenized into a binary-safe format that doesn't need escaping, since that would be smaller/faster. It would allow displaying graphics without taking up RAM... but would it be just as fast, or is external flash slower to access? -
If you would allocate writable data separately in JS and pass it as pointers to code stored to flash it needs no changes.
Something like this?
function malloc(size) { return E.getAddressOf(new Uint8Array(size), true); }
Can that be called from C? Also, does the GC know when it is safe to free the Uint8Array?
Would I need to keep references to the Uint8Array in JS to keep it from being collected?const blocks = []; function malloc(size) { let block = new Uint8Array(size); blocks.push(block); return E.getAddressOf(block, true); } function free(addr) { for (let i = 0; i < blocks.length; ++i) { if (addr == E.getAddressOf(blocks[i], true)) { let last = blocks.pop(); if (blocks.length != i) blocks[i] = last; return; } } }
Edit: To call that from C it seems what I need is
jspGetNamedVariable
andjspeFunctionCall
.Edit2: It might make sense to have a libespruino that implements malloc/free/sin/cos/sinf/cosf/etc by calling the JS implementations.
-
For some reason I don't need to add
-lm
for my local EspruinoCompiler to findsinf
, just like I didn't need-lgcc
before.
It would be nice if each compile/upload in the IDE would give some stats (amount of space taken up by strings, total space taken up in flash, free RAM on the connected device). As it is, I have no idea if 4KB is a lot or if that's acceptable.
Since the code needs to be copied to RAM, I wonder if it makes sense to use heatshrink or LZ4 on it. -
-
-
-
-
-
For some reason the following compiles fine on my local EspruinoCompiler, but fails on the online one:
let c = E.compiledC(` // int boop(double) int boop(double d) { return float(d) * 3; } `); print('boop:', c.boop(3.5));
Maybe it's a different GCC version?
arm-none-eabi-gcc -v
gives me:
gcc version 10.3.1 20210824 (release) (GNU Arm Embedded Toolchain 10.3-2021.10)
I think fixing it on the online one is just a matter of adding
cflags += "-lgcc ";
but I can't test that here. It doesn't increase the binary size unnecessarily and is much better than messing with inline assembly. -
I don't know what the implication is to support float directly, but it does sound like a good alternative to doing the conversion in C. The conversion needs to happen somewhere. Is it better to have it in Espruino, where it will take up space even if you don't use compiledC, or is it better to have in each compiledC block that uses doubles?
Personally, I don't really need to pass floats/doubles to/from C. Just being able to work with floats is enough for me. I merely replied with a working version of the code that was asked for.
-
why do you think so?
- a6: f7ff ffad bl 4 <__aeabi_d2fx> - aa: eef0 7a08 vmov.f32 s15, #8 ; 0x40400000 3.0 - ae: ee20 0a27 vmul.f32 s0, s0, s15 - b2: eefd 7ac0 vcvt.s32.f32 s15, s0 - b6: ee17 0a90 vmov r0, s15 - ba: bd08 pop {r3, pc} + a6: f7ff ffad bl 4 <__aeabi_d2f> + aa: ee07 0a10 vmov s14, r0 + ae: eef0 7a08 vmov.f32 s15, #8 ; 0x40400000 3.0 + b2: ee67 7a27 vmul.f32 s15, s14, s15 + b6: eefd 7ae7 vcvt.s32.f32 s15, s15 + ba: ee17 0a90 vmov r0, s15 + be: bd08 pop {r3, pc}
The top part of the diff is calling the function I pasted earlier (d2f), in the same compilation unit.
It gets the return value directly from S0.The bottom part is calling libgcc's __aeabi_d2f, in a library. The difference is the
vmov s14, r0
, it loads a float value from an int register.Both used the exact same compilation flags.
But anyway, why to use doubles? Cortex M4F can do only floats in hardware, isn't this enough for most stuff discussed here?
Because JavaScript uses doubles you need to be able to convert to float and back.
and btw it is possible to set calling convention per method via attributes and when calling one from another it works https://gist.github.com/fanoush/85ebe50c5c4a54ca15bf2867e27f7cd3
Interesting, I didn't know about those. That's probably what's going on in the diff above. I'll do some more testing.
Edit: It seems d2f gets called with hardfp by default and adding__attribute__((pcs("aapcs")))
it gets called by softfp (tested on the online EspruinoCompiler, not my own). Anyway, d2f can be called by either mode now since it returns the result in both R0 and S0. -
-
The Pico code is for a Cortex M0+ and what you have in your snippet is incomplete.
The correct code for a Cortex M4 is this (from assembled+disassembled libgcc, corrected to return the result in S0 instead of R0, since the compiler ignores the softfp ABI for functions in the same compilation unit, apparently):let c = E.compiledC(` // int boop(double) extern "C" float __attribute__((naked)) d2f(double) {__asm__ volatile(R"( mov.w r2, r1, lsl 1 subs.w r3, r2, 0x70000000 itt cs subscs.w ip, r3, 0x200000 rsbscs ip, ip, 0x1fc00000 bls.n 2f 1: and.w ip, r1, 0x80000000 mov.w r2, r0, lsl 3 orr.w r0, ip, r0, lsr 29 cmp.w r2, 0x80000000 adc.w r0, r0, r3, lsl 2 it eq biceq.w r0, r0, 1 b 6f 2: tst.w r1, 0x40000000 bne.n 3f adds.w r2, r3, 0x2e00000 itt lt andlt.w r0, r1, 0x80000000 blt 6f orr.w r1, r1, 0x100000 mov.w r2, r2, lsr 21 rsb r2, r2, 24 rsb ip, r2, 32 lsls.w r3, r0, ip lsr.w r0, r0, r2 it ne orrne.w r0, r0, 1 mov.w r3, r1, lsl 11 mov.w r3, r3, lsr 11 lsl.w ip, r3, ip orr.w r0, r0, ip lsr.w r3, r3, r2 mov.w r3, r3, lsl 1 b.n 1b 3: mvns.w r3, r2, asr 21 bne.n 5f orrs.w r3, r0, r1, lsl 12 ittt ne movne.w r0, 0x7f000000 orrne.w r0, r0, 0xc00000 bne 6f 5: and.w r0, r1, 0x80000000 orr.w r0, r0, 0x7f000000 orr.w r0, r0, 0x800000 6: vmov s0, r0 bx lr nop )");} int boop(double d) { return d2f(d) * 3; } `); print('boop:', c.boop(3.5)); // 3.5 * 3 = 10.5 cast to int = 10
Note that you have to call the function explicitly, I couldn't use a float cast and get it to link as __aeabi_d2f. It might work if we compile the assembly into a library and then link to that. If one were to go that far, might as well link to libgcc and let the linker throw away all the stuff that isn't used.
-
-
I sent you the PR that allows specifically
#include <math.h>
and enables the string literals.Edit: It'd probably be good to have some other headers in there, such as cstdint.
-mfloat-abi=name
Specifies which floating-point ABI to use. Permissible values are: ‘soft’, ‘softfp’ and ‘hard’.
Specifying ‘soft’ causes GCC to generate output containing library calls for floating-point operations. ‘softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. ‘hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions.Looks like changing from soft to softfp is exactly what we want.
Edit2: I tested softfp and the compiler does output floating-point instructions. I have no code to test if the float argument passing works as intended, though.
Just confirmed that Warpdrive now works with pretokenization enabled. :)