• The P8 smartwatch is good one to hack and put Espruino onto it. It can be updated without taking apart and is quite cheap ans has full touchscreen and 240x240 ST7789 display.

    Here is demo of Espruino build with display driver implemented as Inline C code.


    While there is already support for this display for banglejs, that driver is not SPI based. There is another one for previous F5 watch here but that one has framebuffer as static variables. I tried to update it to banglejs driver style but the whole "1. build espruino, 2. flash to watch, 3. see it crashing , 4. repeat" cycle was too slow so I first tried to prototype it in javascript and inline C and see ho far I could go with this.

    This uses DMA in similar way as the spilcd driver but was written from scratch without using nordic SDK to learn about ST7789 and nrf52 EasyDMA and SPI hardware (first version just bitbanged SPI RXD and TXD registers without using DMA).

    BTW it is interesting how much JS interpreter is slower, try to use g.jsflip and see the difference, the code is otherwise the same. While Espruino performance is otherwise OK here it really struggles.

    SDK14 and SDK11 based builds are here with some instructions.

    Oh and BTW thanks for cube animation, it was stolen from pixl.js conference badge code

  • Very impressive indeed. How does the speed of the SPI display compare with the 8-bit parallel Bangle version?

  • I don't know as I did not see both next to each other, but previously I thought 8Mbps SPI with 240x240 must be pathetic (see this forum post and the video there) but this is not so bad.

    There is no DMA for 8-bit parallel mode so the code must do both at once - palette conversion and/or scaling and sending to display. Wth DMA you can compute first block of pixels in advance and then both can run in parallel. With larger blocks than just few bytes the DMA could amost reach maximum theoretical 8Mbps speed. So e.g filling 240x240 in 12 bits (1.5bytes per pixel) takes 96ms when using dma in 24byte blocks.

    Even if there is mode when the DMA does not stop and can send several buffers (or same buffer) repeatedly on its own automatically, for some reason letting it loop over 6 byte buffers or 24byte buffers in such mode is a difference. With just the 6 byte buffer in code above it it was still like 125ms ~= same as sending it without DMA, and the 3byte buffer was even slower.

  • Also this was first meant as quick prototype to try various methods of sending data quickly, I am a bit surprised it can be probably used as standalone driver as is. But of course proper way is to merge it with driver code https://github.com/espruino/Espruino/blo­b/master/libs/graphics/lcd_st7789_8bit.c­ and possibly just change macros/methods here to put same data into some buffer and trigger DMA when buffer is filled istead of using 8bit mode.

  • The rotating cube looks quite fast - fast enough to use as you note. With hardware supported SPI, I have previously managed to get quite good performance for a relatively simple Arduino based ILI9341 driver - see.

  • ILI9341 driver - see.

    Oh, nice, thank you for linking that, the canvas idea with different smaller areas of different bit depth is interesting and could save memory. I think that is what @Gordon was adding to Espruino in recent days? Then you don't need to have whole framebuffer (28K in my case for 16 color 4bit one) and could blit such areas directly. E.g. for fonts there is otherwise no other way than to draw each pixel separately which is very slow as seen in the simple driver example here - that one was very painful to see on P8.

    Not sure what it will do with memory fragmentation to create such (relatively large) memory blocks dynamically. With one framebuffer you create it once on the beginning. And with around 300 variables free I am experiencing issues to get small flat strings created, this gets triggered sometimes unless I put the code to flash.

    The rotating cube looks quite fast

    Oh, that one is even fully drawn in javascript running via setInterval() so that the device is still responsive.

  • Yes, Gordon’s createArrayBuffer and drawImage with palettes are exactly the same mechanism. Actually you can usually get away with one or two buffers and move them around to write at different times. My gpsnav app uses only one buffer which is used every 200ms to draw the compass display and once a second for numerical data display.

    Combining javascript and compiled C in your driver is really neat, I was going to ask to see it!

  • That looks really cool! Actually I believe there is already code to handle ST7789 via SPI (with DMA!) in Espruino, since it was used on the ID205 that I'd been considering for Bangle.js: https://github.com/espruino/Espruino/blo­b/master/libs/graphics/lcd_spilcd.c

    However, that uses an offscreen buffer - which worked on the nRF52840, but is probably not so great on the nRF52832 :)

    It'd be really cool to pull something like this into Espruino itself - actually sending individual pixels but via DMA looks really interesting performance-wise.

    There are some hacks you can steal from the existing 8 bit driver too - like detecting if you're sending to X,Y then X+1,Y and not sending new XY coordinates each time.

  • Just a followup. I also got the 16bit mode working and it is visibly slower. 30ms more is really visible difference. Also rewrote the framebuffer/palette lookup reading code so that I can have 3-bit 8 color mode (pixels not aligned to byte boundary). Unfortunately Espruino does not support that, learned after it was already done. After adding bpp=3 to isValidBPP here https://github.com/espruino/Espruino/blo­b/master/libs/graphics/jswrap_graphics.c­#L181 it almost works, just have some wrong thin blank vertical stripes where the byte boundary is crossed. will check how easily it can be fixed.

    Also as for colors I tried some dynamic palette for bpp=1 mode so I could change colors between flips - to have multiple colors with only 1 bit framebuffer. And it can work like that for many things without a need for full 2 or 4 bpp. With 1bpp the first cube rotating screen and then second shape fill screen look almost same as before. Only random lines show some nice ZX Spectrum style color artifacts :-)

    Current code is still here also briefly described how one could self host the compiler https://github.com/fanoush/ds-d6/wiki/Es­pruino-Inline-C if you want to modify it.

    It is actually pretty nice environment with inline c, one could upload code quickly without even touching the flash memory and try anything freely. With watchdog enabled bugs are harmless. Thinking about exporting unused interrupt vectors from Espruino as this is currently clear limitation of inline C - I cannot hook into SPI interrupt in the driver now. Something like E.set/getInterruptVector(intNo) would do. Any vector unused by current espruino build could be exported like that.

  • With reference to your recent post - congratulations on getting execution from SPI working.

    Does the new build include lcd_spi_unbuf? I would be interested in how it performs in comparison with the driver you describe here and I would like to try it now my P8 has arrived.

  • Yes, it is enabled, there are also board files used to build it in https://github.com/fanoush/ds-d6/tree/ma­ster/espruino/DFU/P8 and there is USE_LCD_SPI_UNBUF=1 inside, for the InlineC driver try upload example from this gist (also linked in readme). It should now work also with build with storage in SPI flash.

    I actually didn't try lcd_spi_unbuf with the storage it SPI flash, quite likely it may not work out of box due to shared SPI, so for comparison better take version with storage in internal flash (the one without _SPIFLASH suffix) which is still good enough unless your code is over 120KB.

    If the guide in readme is confusing also check https://github.com/enaon/ninebot-one-nRF­52/tree/master/p8-nb

  • Thanks, I managed to get my P8 flashed with Espruino with no problems. I did the following measurements for the Lcd_spi_unbuf driver.

    I compared the time taken to fill a 240 x 160 pixel rectangle with the time to draw a 240 x 160 1 bit image. I include the results for Bangle and ESP32 (T-watch for comparison.


    Bangle: fillRect 11ms, drawImage 66ms.

    ESP32: fillRect 44ms, drawImage 82ms.

    P8: fillRect 256ms, drawImage 331ms.

    The speed is not great, however, its is worth noting that the issue is not rendering palleted images but it is simply getting pixels sent to the driver. There are at least two improvements I can think of:

    1) The buffer at 128 (256 bytes) is nearly exactly the wrong size as the implementation of spiSendMany uses EasyDMA with a maximum transfer size of 255 bytes i.e. the current buffer size causes two transfers, the second of 1 byte. I would like to try a buffer size of 240 (480 bytes).

    2) spiSendMany is currently used synchronously in spiSendMany so it would be interesting to try double buffering to speed things up.

    I would really like to be able to build the firmware to test this and also to add a Bluetooth hack to support my ANCS widget. I see that you have made the board description public but I would guess that you also need the other files (bootloader etc) to build it?

    Thanks agian for making the package available - the P8 has a bright display and its good to experiment with the touchscreen. I found it really easy to transfer the apps I have been running on the T-watch to the P8. Will share when I clean things up.

    function time_fill(){
        var time= Date.now();
        time = Math.floor(Date.now()-time);
        console.log("Time to Draw Rectangle: "+time+"ms");
    var pal1color = new Uint16Array([0x0000,0xF100]);
    var buf = Graphics.createArrayBuffer(240,160,1,{ms­b:true});
    function time_image(){
        var time= Date.now();
        g.drawImage({width:240,height:160,bpp:1,­buffer:buf.buffer, palette:pal1color},0,40);
        time = Math.floor(Date.now()-time);
        console.log("Time to Draw Image: "+time+"ms");
  • P8: fillRect 256ms, drawImage 331ms.

    Did you test also g.flip of my driver? should be below 100ms for full 240x240 screen

    I see that you have made the board description public but I would guess that you also need the other files (bootloader etc) to build it?

    It is built with nordic SDK11 so you need just that, but I have it slightly patched, you can get
    targetlibs_nrf5x_11.tgz from https://github.com/fanoush/ds-d6/tree/ma­ster/espruino and extract this in Espruino folder and build with make -j BOARD=P8-SDK11 RELEASE=1 DFU_UPDATE_BUILD=1. No bootloader needed, you already have it.
    Only recently there is new i2c slave code that modifies sdk12 so for now easiest for building with SDK11 is to edit makefile and remove this line with nrf_drv_twis.c otherwise it breaks with error that no slave devices are enabled.

    SDK11 because there is a bit more flash and variables available and also because arduino environment is SDK11 too so it is easier to switch between them then. However with storage in spi flash there is also good reason to move to SDK12 as other espruino devices to simplify builds so I'll make softdevice+bootlader upgrade package to move it to SDK12 too.

  • Did you test also g.flip of my driver? should be below 100ms for full 240x240 screen.

    I did not get around to testing it as I realised it was faster which makes me believe that it should be possible to speed up lcd_spi_unbuf which has the advantage of flexibility in terms of screen buffering.

    Many thanks for the information on building - I will have a go and get back to you. Look forward to SDK12 as I think the Apple ANCS widget requires secure connections.

  • it was faster which makes me believe that it should be possible to speed up lcd_spi_unbuf

    one trick I use is the 12bit color mode which cannot be used for single pixels (2 pixels are stored in 3 bytes). it makes 30ms difference in fullscreen update. The number doesn't look high but if you try both you'll clearly see the difference.

    which has the advantage of flexibility in terms of screen buffering

    Well, the flexibility could be there too with separate smaller Graphics arraybuffers. If you pre-render stuff into images for speed you are mostly doing same stuff. The native inline code can be simply called with colormap, bitmap rectangle and bpp/stride so can possibly draw also partial areas (that's what it is is already doing - it redraws only modified parts) without any fullscreen buffer and possibly of different bit depths each.

  • as for lcd_spi_unbuf , it is good when the spi bus is fast enough so that drawing separate pixels is bearable (lines, circles, fonts). that may be true for ESP but IMO not for nrf52. Also the driver is small an yet it has lot of stuff hardcoded - bpp16, ST7789 (the real name should perhaps be lcd_spi_unbuf_st7789_bpp16). It breaks with 12bit mode and for DK08 I even need 6bit (RGB222) mode and 2a/2b/2c commands are different on ST7301 too. If you would remove that too then not sure what would remain - just async generic spi dma writing perhaps.

  • Thanks to the clear instructions, I had no problem building the P8 package. I also tested your driver which gives an impressive result of 73ms for the fill rectangle test - fill_time(). Admittedly, this is 12-bit but it would still be less than 100ms for 16-bit while the best that my lcd_spi_unbuf driver can do is 237ms.

    I accept your arguments about single pixel operations, however this performance difference is not caused by pixel operations, since both drivers are sending large buffers via EasyDMA. The difference seems to be caused by the fact that you have used a direct implementation of EasyDMA while the Espruino jshSPISendMany uses SoftDevice routines - including interrupt handling. Your implementation - similar to @atc1441's ATCWatch fastspi module does not use either the SoftDevice or interrupts. I am surprised that the Nordic routines seem to incur such a large overhead.

    BTW. Both Espruino and the ATCwatch module have workarounds for the 1 byte EasyDMA transfer bug while your implementation does not - yet it seems to work OK?

  • Both Espruino and the ATCwatch module have workarounds for the 1 byte EasyDMA transfer bug while your implementation does not - yet it seems to work OK?

    I am not reading back, that bug is only happening when you read back 1 byte. [58] SPIM: An additional byte is clocked out when RXD.MAXCNT = 1

    jshSPISendMany uses SoftDevice routines

    well, it is not softdevice, those are nordic SDK drivers linked directly to your code. Their code is doing same things as my driver mostly - setting same registers. And from glancing over the code I wouldn't guess it is so bad, in general the compiler does pretty good job when compiling their code. There are some abstractions but mostly the code is optimized away and/or inlined where possible.

    softdevice apis are called via software interrupts - SVC calls - that may be indeed slower but it is not the case here, only few parts of hardware are handled like that, see https://infocenter.nordicsemi.com/topic/­com.nordic.infocenter.s132.api.v3.0.0/gr­oup__nrf__soc__api.html?cp=4_7_3_8_2_7

  • I am surprised that the Nordic routines seem to incur such a large overhead.

    Oh, if you have the SDK11 build, check if DMA is enabled, most likely it is not turned on by default, try to add -DSPI0_USE_EASY_DMA=1 to board file. See also targetlibs/nrf5x_11/nrf52_config/nrf_drv­_config.h

    As for drivers, the source of spi driver for SDK12 is here https://github.com/espruino/Espruino/blo­b/master/targetlibs/nrf5x_12/components/­drivers_nrf/spi_master/nrf_drv_spi.c , it is very similar in SDK11.

  • Brilliant! Many thanks for your help. Its not enabled in targetlibs/nrf5x_11/nrf52_config/nrf_drv­_config.h. Will enable there, add to board file and rebuild.

    And thanks for clarifying SDK vs SoftDevice routines for me.

  • The performance when I rebuilt with DMA enabled was:

    P8: fillRect 81ms, drawImage 154ms.

    which is much more reasonable. However, I have now implemented double buffering which with two 60 pixel (120 byte ) buffers gives:

    P8: fillRect 85ms, drawImage 111ms.

    There is a tradeoff in that smaller buffers allow more overlap but increase the time to draw the rectangle - important for fast g.clear().

    I would now like to get the driver to work with SPI Flash which I guess requires some care with the chip select pins?

  • not only chip select pins, also whole spi interface, there is new define constant merged recently here https://github.com/espruino/Espruino/pul­l/1943 so that you can pull flash cs high and it will restart reading. However you must also disable hw spi to release clk,mosi pins as the spi flash code uses software spi and needs it as gpios. or the spi flash code would need to use same hardware spi too but that may cause other issues

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview

Espruino on P8 smartwatch - ST7789 display driver in Inline C

Posted by Avatar for fanoush @fanoush