Graphics on BangleJS2

Posted on
Page
of 5
/ 5
Last Next
  • Since the BangleJS has a framebuffer, I expected its Graphics Object to extend the arraybuffer one, but that is apparently not the case as g.buffer is undefined and g.asImage("object") returns a copy.
    Is there any way to access the framebuffer directly?

    Edit: Couldn't find a good way to do it, did it anyway.

    Also, I think 'LCD' was meant to be 'g' in https://www.espruino.com/Graphics (or does that not apply to the BangleJS2?):

    On the few boards that do contain LCDs, there is a predefined variable
    called 'LCD' (which is an instance of the Graphics Object).

  • To better demonstrate what I wanted this for (from the PR linked above):


    1 Attachment

    • warpdrive.gif
  • Hi - the issue is that the framebuffer isn't entirely linear (and arraybuffer assumes it is) - because it's sent direct to the LCD via DMA it contains a few command bytes at the beginning of each line. Plus I guess as you found out, having 3 bit color isn't desperately easy to work with.

    The code in that app is very impressive, but I feel like trying to access the framebuffer directly is going to be a complete nightmare... Apart from the way it moves around every release, Bangle.js does background DMA after you flip - so you may well end up altering the buffer contents while it's being send to the LCD which will cause tearing.

    We could add a function to get the address of the buffer I guess that could wait for DMA to finish, but it's still a bit iffy - if you accidentally overwrite those command bytes the screen will no longer update correctly.

    Have you checked the speed difference from just having a pre-allocated ArrayBuffer Graphics object that you write to (in maybe a more sensible bit depth) and blitting that to the screen? I'd have thought that might be reasonably quick.

    Or even I wonder if your triangle draw implementation is actually that much faster than Bangle's own one, since it seems they do work in a pretty similar way... I guess it's just actually being able to call Bangle.js's one fast enough!

  • Indeed, working with that framebuffer isn't very easy, but that's part of the point for me. Having to deal with these weird details is fun.

    My first attempt at this did involve an ArrayBuffer Graphics object. I still don't have a proper grasp of the Bangle.js code, but I was under the impression that it doesn't do a quick blit from that to the framebuffer. There's also the RAM cost.

    Writing the code that finds the buffer in RAM was fun too, but of course it would be better to have a safe API that takes DMA into account. That said, I don't notice any tearing. Before writing to the framebuffer I call 'g.drawRect(-1, -1, 0, 177)' so that all of the rows get marked as dirty. I imagine this blocks until DMA is done?

    I haven't checked the speed yet, nor am I sure of the best way to do so. What is the highest resolution timer I can use? Date.now()? I noticed that adding more ships makes the framerate drop, not because of the graphics code, but because of the javascript that controls the movement. Adding the equivalent amount of triangles as stars (controlled by C) has a much lower impact.

  • Having to deal with these weird details is fun

    can't argue with that! :)

    I was under the impression that it doesn't do a quick blit from that to the framebuffer

    In the case where it's a normal Graphics instance and it's not clipped/scaled/rotated there is a reasonably fast path for rendering - it's not quite as fast as it could be, but it's not too bad.

    Before writing to the framebuffer I call g.drawRect

    Ok, great! Yes, that would ensure it's waited.

    And yes, getTime/Date.now use the same underlying timer - they're only accurate to 1/32768 sec on the Bangle, but I imagine that's still enough for benchmarking frame draw times!

    And I can imagine about the C/JS difference - the JS interpreter is going to be at least 100x slower than C I imagine

  • I still haven't gotten around to benchmarking the drawing routines, but I've been experimenting and made a second watch face (also in the PR above). With JS being the bottleneck, benchmarking the triangle function wouldn't help. And maybe I'm going to need texture mapping for the next watchface. :)

    I haven't tried the ArrayBuffer Graphics yet. In part because of the RAM it'd take up, and even if it's fast enough, it's better to access the buffer directly and let the CPU sleep a bit more, right?

    Is adding a function to get the buffer an option? I'd sure prefer that instead of probing for it with peek8! :P

  • Is adding a function to get the buffer an option?

    Yes, I think that'd be fine. Just added - you can get it with Bangle.getOptions().lcdBufferPtr on the cutting edge firmware builds (or 2v21 when released)

  • Awesome! Thanks!

  • Trying not to nag, but a question that I got in mind since you started this thread: Why is it a clock?
    Why don't you make a game out of it using the gyro as a controller? Maybe something like FlappyBirds but just having the ship heading to the z-axis instead of a side scroller. The ship could rotate 360° then and try to avoid obstacles that may hit you.

  • It's a clock because that's simpler to start off with, and I wanted to make something animated to entertain my kids. They're two year olds, so they can't actually play games, but they do have fun watching and poking at the screen.

    I do plan on making games out of this, though. I already have a ship being controlled by the gyro, but I'm not sure if I'm going to turn that into a full game or just a slightly interactive clock.

  • Just in case anyone finds this interesting or would like to make suggestions:

    Before trying to implement texture mapping, I decided to try lighting support.
    For that I need to calculate each triangle's normal vector, to find out if it is facing towards the light.
    This requires a bit of math: inverseLength = 1 / sqrt(x * x + y * y + z * z)
    All of the code so far has been using 23.8 fixed-point math, and while the formula above can be implemented without floats, it'd be much slower.

    Of course, we can't use math.h in compiledC code, so this is where the fun begins:

    int __attribute__((naked)) v_invlength(int x, int y, int z) {__asm__ volatile(
            push {lr}
            VMOV.F32 S0, r0
            VCVT.F32.S32 S0, S0, 8
            VMUL.F32 S0, S0, S0
    
            VMOV.F32 S1, r1
            VCVT.F32.S32 S1, S1, 8
            VMLA.F32 S0, S1, S1
    
            VMOV.F32 S2, r2
            VCVT.F32.S32 S2, S2, 8
            VMLA.F32 S0, S2, S2
    
            VMOV.F32  s1, #1.0e+0
    
            VSQRT.F32 S0, S0
            VDIV.F32 S0, S1, S0
            VCVT.S32.F32 S0, S0, 16
            VMOV.F32 r0, S0
            pop {pc}
    );}
    

    (quotes and newlines omitted for clarity. It would've been nice if we could use raw string literals for inline assembly)

    Unfortunately that doesn't work either because gcc really doesn't want to work with floats. To get around that, I assembled that function locally and replaced each instruction with the result as data:

    int __attribute__((naked)) v_invlength(int x, int y, int z) {__asm__ volatile(
    ".short 0xb500 \n               \n"   // push	{lr}
    ".short 0xee00 \n .short 0x0a10 \n"   // vmov	s0, r0
    ".short 0xeeba \n .short 0x0acc \n"   // vcvt.f32.s32	s0, s0, #8
    ".short 0xee20 \n .short 0x0a00 \n"   // vmul.f32	s0, s0, s0
    ".short 0xee00 \n .short 0x1a90 \n"   // vmov	s1, r1
    ".short 0xeefa \n .short 0x0acc \n"   // vcvt.f32.s32	s1, s1, #8
    ".short 0xee00 \n .short 0x0aa0 \n"   // vmla.f32	s0, s1, s1
    ".short 0xee01 \n .short 0x2a10 \n"   // vmov	s2, r2
    ".short 0xeeba \n .short 0x1acc \n"   // vcvt.f32.s32	s2, s2, #8
    ".short 0xee01 \n .short 0x0a01 \n"   // vmla.f32	s0, s2, s2
    ".short 0xeef7 \n .short 0x0a00 \n"   // vmov.f32	s1, #112	; 0x3f800000  1.0
    ".short 0xeeb1 \n .short 0x0ac0 \n"   // vsqrt.f32	s0, s0
    ".short 0xee80 \n .short 0x0a80 \n"   // vdiv.f32	s0, s1, s0
    ".short 0xeebe \n .short 0x0ac8 \n"   // vcvt.s32.f32	s0, s0, #16
    ".short 0xee10 \n .short 0x0a10 \n"   // vmov	r0, s0
    ".short 0xbd00 \n               \n"   // pop	{pc}
    );}
    
    

    The result of this abomination is attached below.


    1 Attachment

    • hyperspace.png
  • nice! I should add that the compiler itself is at https://github.com/gfwilliams/EspruinoCompiler if you decide you want to try fiddling around with that.

    If you were able to change the build such that math.h functions were pulled in as required I bet it'd make a lot of people very happy! Also you might be able to change the compiler flags to let it take raw string literals?

    Worth adding that right now we have to tell GCC to use softfp so that the argument passing of floats works when calling back to the Espruino interpreter. But you might be able to get it to use softfp for arg passing while still using hardfp instructions - I'm not sure!

  • I sent you the PR that allows specifically #include <math.h> and enables the string literals.

    Edit: It'd probably be good to have some other headers in there, such as cstdint.

    -mfloat-abi=name
    Specifies which floating-point ABI to use. Permissible values are: ‘soft’, ‘softfp’ and ‘hard’.
    Specifying ‘soft’ causes GCC to generate output containing library calls for floating-point operations. ‘softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. ‘hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions.

    Looks like changing from soft to softfp is exactly what we want.

    Edit2: I tested softfp and the compiler does output floating-point instructions. I have no code to test if the float argument passing works as intended, though.

  • This is all very exciting. Years ago (for the Bangle 1) I wrote an STL viewer that does some of the heavy lifting in C (viewstl in the app store). Back then I did all the math library function (mostly trig functions) in JS and passed the results into C (simple arithmetic fp ops worked fine, albeit not using the fpu).
    I have a half-finished Space Shuttle lander game that uses 3d graphics. My approach was to fill arrays with polygon screen coordinates in C and then call graphics routines in JS, but your approach of accessing the screen buffer directly from native code seems better performance wise, Felipe. I might adopt that approach, thanks for figuring all this out, very impressive!

  • It's good to know I'm not the only one trying to do 3D on a Bangle. :P
    I hope this stuff helps and I look forward to your completed lander game!

  • I've just updated with the changes (and softfp) and it would appear to be working ok.

    Now we have some hardware FP support, I wonder if it might be worth implementing at least __aeabi_d2f and __aeabi_f2d if only as static functions added to what's compiled (which should get compiled out if not used) so that at least it's possible to convert to/from doubles for Espruino?

    I can see the Raspberry Pi Pico seems to have optimised implementations?

    https://github.com/raspberrypi/pico-sdk/blob/6a7db34ff63345a7badec79ebea3aaef1712f374/src/rp2_common/pico_double/double_aeabi.S

  • I was hoping I could add the following to E.compiledC, but even after accounting for the double-escaping, I'm not having success :

    float __attribute__((naked)) __aeabi_d2f(double d) {__asm__ volatile(
        "lsrs r0, #30\n\t"
        "lsls r2, r1, #12\n\t"
        "lsrs r2, #9\n\t"
        "asrs r1, #22\n\t"
        "lsls r1, #22\n\t"
        "orrs r0, r1\n\t"
        "orrs r0, r2\n\t"
        "bx lr"
    );}
    
    double __attribute__((naked)) __aeabi_f2d(float f) {__asm__ volatile(
        "asrs r1, r0, #3\n\t"
        "movs r2, [#0xf](https://forum.espruino.com/search/?q=%230xf)\n\t"
        "lsls r2, #27\n\t"
        "orrs r1, r2\n\t"
        "lsls r0, #25\n\t"
        "bx lr"
    );}
    

    If you have any luck, please let me know!

  • Currently you can pass single precision floats (and converted doubles) into inline C code and back via Float32Array or with shared buffer with Int32Array for conversions of single values when passing parameters (more info https://gist.github.com/fanoush/9227640a869d78d69a799276dff0fb71#file-espruino-fpu-softfp-inlinec-js-L24), maybe it would be nice to support float type directly when passing parameters to native code (with automatic conversion to/from javascript double)? then d2f and f2d would not be needed?

  • The Pico code is for a Cortex M0+ and what you have in your snippet is incomplete.
    The correct code for a Cortex M4 is this (from assembled+disassembled libgcc, corrected to return the result in S0 instead of R0, since the compiler ignores the softfp ABI for functions in the same compilation unit, apparently):

    let c = E.compiledC(`
    // int boop(double)
    
    extern "C" float __attribute__((naked)) d2f(double) {__asm__ volatile(R"(
     	mov.w	r2, r1, lsl 1
     	subs.w	r3, r2, 0x70000000
     	itt	cs
     	subscs.w ip, r3, 0x200000
     	rsbscs	ip, ip, 0x1fc00000
     	bls.n	2f
    1:
     	and.w	ip, r1, 0x80000000
     	mov.w	r2, r0, lsl 3
     	orr.w	r0, ip, r0, lsr 29
     	cmp.w	r2, 0x80000000
     	adc.w	r0, r0, r3, lsl 2
     	it	eq
     	biceq.w	r0, r0, 1
     	b 6f
    2:
     	tst.w	r1, 0x40000000
     	bne.n	3f
     	adds.w	r2, r3, 0x2e00000
     	itt	lt
     	andlt.w	r0, r1, 0x80000000
     	blt	6f
     	orr.w	r1, r1, 0x100000
     	mov.w	r2, r2, lsr 21
     	rsb	r2, r2, 24
     	rsb	ip, r2, 32
     	lsls.w	r3, r0, ip
     	lsr.w	r0, r0, r2
     	it	ne
     	orrne.w	r0, r0, 1
     	mov.w	r3, r1, lsl 11
     	mov.w	r3, r3, lsr 11
     	lsl.w	ip, r3, ip
     	orr.w	r0, r0, ip
     	lsr.w	r3, r3, r2
     	mov.w	r3, r3, lsl 1
     	b.n	1b
    3:
     	mvns.w	r3, r2, asr 21
     	bne.n	5f
     	orrs.w	r3, r0, r1, lsl 12
     	ittt	ne
     	movne.w	r0, 0x7f000000
     	orrne.w	r0, r0, 0xc00000
     	bne	6f
    5:
     	and.w	r0, r1, 0x80000000
     	orr.w	r0, r0, 0x7f000000
     	orr.w	r0, r0, 0x800000
    6:
      vmov s0, r0
     	bx	lr
     	nop
    )");}
    
    
    int boop(double d) {
      return d2f(d) * 3;
    }
    `);
    
    print('boop:', c.boop(3.5)); // 3.5 * 3 = 10.5 cast to int = 10
    
    

    Note that you have to call the function explicitly, I couldn't use a float cast and get it to link as __aeabi_d2f. It might work if we compile the assembly into a library and then link to that. If one were to go that far, might as well link to libgcc and let the linker throw away all the stuff that isn't used.

  • I tested softfp and the compiler does output floating-point instructions. I have no code to test if the float argument passing works as intended, though.

    It already works like that, no changes in EspruinoCompiler needed (unless it broke recently)

    Also using floating point instructions work inside inline C, see my tests linked in previous post.

    Include files and macros are intentionally blocked https://github.com/gfwilliams/EspruinoCompiler/blob/master/src/compile.js#L466 so you can't include math.h

  • since the compiler ignores the softfp ABI for functions in the same compilation unit, apparently

    why do you think so?

    and btw it is possible to set calling convention per method via attributes and when calling one from another it works https://gist.github.com/fanoush/85ebe50c5c4a54ca15bf2867e27f7cd3 (EDIT: or for cortex m4 here https://gist.github.com/fanoush/0da3e47aee9e20fb11b010cc3aa4e16e )

    But anyway, why to use doubles? Cortex M4F can do only floats in hardware, isn't this enough for most stuff discussed here?

  • It already works like that, no changes in EspruinoCompiler needed (unless it broke recently)

    It was set to soft, not softfp. This was changed to softfp today.

    Include files and macros are intentionally blocked

    This is true...

    so you can't include math.h

    ... but this also changed today.

  • why do you think so?

    -  a6:	f7ff ffad 	bl	4 <__aeabi_d2fx>
    -  aa:	eef0 7a08 	vmov.f32	s15, #8	; 0x40400000  3.0
    -  ae:	ee20 0a27 	vmul.f32	s0, s0, s15
    -  b2:	eefd 7ac0 	vcvt.s32.f32	s15, s0
    -  b6:	ee17 0a90 	vmov	r0, s15
    -  ba:	bd08      	pop	{r3, pc}
    
    +  a6:	f7ff ffad 	bl	4 <__aeabi_d2f>
    +  aa:	ee07 0a10 	vmov	s14, r0
    +  ae:	eef0 7a08 	vmov.f32	s15, #8	; 0x40400000  3.0
    +  b2:	ee67 7a27 	vmul.f32	s15, s14, s15
    +  b6:	eefd 7ae7 	vcvt.s32.f32	s15, s15
    +  ba:	ee17 0a90 	vmov	r0, s15
    +  be:	bd08      	pop	{r3, pc}
    

    The top part of the diff is calling the function I pasted earlier (d2f), in the same compilation unit.
    It gets the return value directly from S0.

    The bottom part is calling libgcc's __aeabi_d2f, in a library. The difference is the vmov s14, r0, it loads a float value from an int register.

    Both used the exact same compilation flags.

    But anyway, why to use doubles? Cortex M4F can do only floats in hardware, isn't this enough for most stuff discussed here?

    Because JavaScript uses doubles you need to be able to convert to float and back.

    and btw it is possible to set calling convention per method via attributes and when calling one from another it works https://gist.github.com/fanoush/85ebe50c­5c4a54ca15bf2867e27f7cd3

    Interesting, I didn't know about those. That's probably what's going on in the diff above. I'll do some more testing.
    Edit: It seems d2f gets called with hardfp by default and adding __attribute__((pcs("aapcs"))) it gets called by softfp (tested on the online EspruinoCompiler, not my own). Anyway, d2f can be called by either mode now since it returns the result in both R0 and S0.

  • was set to soft, not softfp. This was changed to softfp today.

    Oh interesting, I was thinking it already changed to softfp when we discussed it long time ago here https://github.com/gfwilliams/EspruinoCompiler/issues/10 so I was using it with my local changes all the time (as I use many boards that are not in the whitelist)

    EDIT: softfp was already used for quite some time in main firmware https://github.com/espruino/Espruino/blob/255dbb036942c59c2e937d3c80c206d88586be79/ChangeLog#L282 then it briefly changed but was reverted https://github.com/espruino/Espruino/commit/42e336f4ac5613582ea582f15335527ea9dcd700

    I was somehow thinking the inline C compiler already moved to softfp too, well, it is great it was finally done now :-)

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Graphics on BangleJS2

Posted by Avatar for FManga @FManga

Actions