-
• #2
hi, if you can build your own firmware you might include this
https://github.com/espruino/Espruino/blob/master/libs/graphics/lcd_spi_unbuf.c
-
• #3
Thanks, that is really neat. I will try to compare the performance with my driver which uses drawImage. I suspect the performance of drawImage implemented on top of single pixel operations will not be great.
I still think it would be generally useful to be able to write parts of flat buffers to SPI .
-
• #4
suspect the performance of drawImage implemented on top of single pixel operations will not be great.
Well it's using
c code
andspi_send_many
that makes it extrem fast as you can see in this video. -
• #5
That's very impressive. I have just managed to build ESP32 firmware including it using your excellent tutorial. I think spi_lcd_unbuf should be added to the standard ESP32 board as most people will want to have a display.
-
• #7
This caused me a little grief with your custom board tutorial:
# Add SPI_LCD_UNBUF to handle 16bit color tft lcd displays 'SPI_LCD_UNBUF'
Should be LCD_SPI_UNBUF!
-
• #9
Hi, I have now managed to do a comparison. See video below:
The numbers in red are the
lcd_spi_unbuf
driver and the numbers in green are the version I wrote that directly implementsdrawImage
. The application code for both is identical and is trying to increment and display the number every 100ms:// display incrementing number var pal2color = new Uint16Array([0x0000,0xF100]); var buf2 = Graphics.createArrayBuffer(20,64,1,{msb:true}); buf2.setRotation(3); buf2.setColor(1); buf2.setFont("Vector",20); var N = 0; function drawNumber() { buf2.drawString(N,0,0); lcd.drawImage({width:20,height:64,bpp:1,buffer:buf2.buffer, palette:pal2color},30,50); buf2.clear(); ++N; if (N>999) N = 0; } setInterval(drawNumber,100);
The
drawImage
based driver (green numbers) appears to be over twice as fast as thelcd_spi_inbuf
version. I have cheated a bit in that I have chosen an image size that does not create a fragment. My conclusion is that I will try to combine these two drivers to give the best of both and I think that it is still important to get a solution to the original issue I mentioned of being able to write only part of a buffer to spi.You can see the full code here
-
• #10
Thanks for sharing.
Yep, of cause drawImage() is always faster than drawString() with vector font.
Video post #4 is using setFont("6x8",4) because it is much faster than using vector and with option true no clear is needed, so this is similar to drawImage().
- drawImage() can benefit of spi_send_many
- lcd.setFont("Vector",20).drawString() looks like it is drawing pixel by pixel.
My conclusion:
- use drawImage when using vector font
- split your screen in small sections and use drawImage to update them
- drawImage() can benefit of spi_send_many
-
• #11
Hmm, not sure any more.
Tested a lcd spi ILI9341 with 320x240 pixel, interval 0.1sec and it's not as slow as in video post #9.
g.setColor(0xffff).setFont("Vector", 40).clearRect(120,100,200,140).drawString(times, 120, 100, true); g.setColor(0xffff).setFont("6x8", 4).drawString(times, 120, 160, true);
-
• #12
This does not work as b.buffer is in fact g.chunkbuf.buffer and consequently, the spi.write sends the whole chunk buffer rather than part of it.
Can't easily verify the spi.write since it does not return anyting but with spi.send it works as expected at least for nrf52. See this
>var fc=new SPI();fc.setup({sck:D29,miso:D31,mosi:D30,mode:0}); =undefined >fid=Uint8Array([0x9f,0,0,0]) =new Uint8Array([159, 0, 0, 0]) >fid2=Uint8Array(fid.buffer,0,2) =new Uint8Array([159, 0]) >fc.send(fid,D27); =new Uint8Array([0, 133, 96, 21]) >fc.send(fid2,D27); =new Uint8Array([0, 133])
so it can be seen only two bytes are sent in second case. Maybe don't use
spi.write(b.buffer);
butspi.write(b);
? -
• #13
Thanks, I did try that and I think you are correct in that now that I have redone the experiment it does seem to send the right amount of data. However, it seems to scramble the pixel data in some way that I have not yet worked out:-(
Will investigate further (tbc) ........
-
• #14
Finally got it to work - perhaps obvious in retrospect!
E.mapInPlace( new Uint8Array(img.buffer, chunks*CHUNKSIZE*img.bpp/8, remnt), g.chunkbuf, img.palette, img.bpp ); spi.write(new Int8Array(g.chunkbuf.buffer,0,remnt*2));
-
• #15
Hey, thanks for your update, look's like a great improvement for flicker free vector font drawing.
So why not add
lcd_spi_unbuf_drawImage()
to lcd_spi_unbuf.c to make it even faster? -
• #16
I think that’s a great idea and I will certainly have a go at it when time permits.
-
• #17
So why not add lcd_spi_unbuf_drawImage
I probably wouldn't want to merge something in if it was a specific hack for
drawImage
just forlcd_spi_unbuf.c
.There are some really easy wins for
lcd_spi_unbuf.c
speed though. Something like:int lastx,lasty; void lcd_spi_unbuf_setPixel(JsGraphics *gfx, int x, int y, unsigned int col) { uint16_t color = (col>>8) | (col<<8); jshPinSetValue(_pin_cs, 0); if (x!=lastx+1 || y!=lasty) { disp_spi_transfer_addrwin(x, y, LCD_WIDTH, y+1); lastx = x; lasty = y; } else lastx++; spi_data((uint8_t *)&color, 2); jshPinSetValue(_pin_cs, 1); }
-
• #18
Having looked at the Espruino graphics C code, I can see that there is no provision for an image bit blit style callback as there is for pixels and rectangles, so I completely agree that it’s best left as it is. I am impressed that Espruino supports reasonable performance for drawImage using only Javascript.
-
• #19
There are some really easy wins for lcd_spi_unbuf.c speed though.
Thanks, so let me run some tests and come up with a pr.
-
• #20
There are some really easy wins for lcd_spi_unbuf.c speed though.
Any things else you like to point out to improve performance?
-
• #21
That's the main one and you should see some decent speed improvements with that.
The other one would be to move to DMA. If you reserved 2 LCD_WIDTH length buffers then you could:
- SPI write immediately on the first pixel received (or if doing fillrect)
- If in progress, write subsequent pixels (on the same line) int the second buffer
- When the first DMA finishes, kick off DMA for buffer #2
I believe this is the sort of thing @fanoush already tried when he ported to some watches with SPI displays - you can effectively be doing the transmission at the same time as working out the next stuff to draw.
- SPI write immediately on the first pixel received (or if doing fillrect)
-
• #22
Yes, but it can be done because my driver does two things (in parallel) when updating some rectangular area
- palette conversion (any palette depth to 8, 12 or 16 bits RGB)
- sending pixel data
If you keep those steps separated - palette conversion done via E.mapInPlace or drawImage into buffer and then using lcd_spi_unbuf driver for sending the area over SPI, you cannot do it in parallel
So it comes back to
So why not add lcd_spi_unbuf_drawImage() to lcd_spi_unbuf.c to make it even faster?
and
I probably wouldn't want to merge something in if it was a specific hack for drawImagejust for lcd_spi_unbuf.c
This is specific optimization for DMA based (in this case SPI) graphics driver - draw image in any bpp/palette source into destination rectangle in different bit depth (mainly RGB). So adding this to the unbuffered driver makes sense to me. @Gordon what version of such code would you merge? if it should reuse some generic palette/bpp conversion graphics code it would need to be callable in chunks - i.e. 'convert next x bytes from source into this buffer'.
Or is there some other way to look at it? Maybe it could be somehow hooked into drawing algorithms directly and be somehow line based? Thats not what I did, it first needs to be drawn into memory buffer completely and then conversion/blitting is done in g.flip for modified rectangle. With this unbuffered driver it would be the same just without full buffer for whole screen but only for smaller rectangular image so g.flip() would become g.drawImage().
@jeffmer not sure if you've seen it, the InlineC method of my driver (version for DK08 watch, the destination is 6bit RGB) is here and calling it in g.flip is here
For P8 watch and 240x240 ST7789 destination is 12 bits RGB (saves 33% vs 16 bit RGB). 12bit mode is also on ST7735 but maybe it is not so visible on smaller screens, with 240x240 the 33% speedup is visible.
- palette conversion (any palette depth to 8, 12 or 16 bits RGB)
-
• #23
There are some very specific hacks for Bangle.js graphics: https://github.com/espruino/Espruino/blob/master/libs/graphics/lcd_st7789_8bit.h#L52
But I'd rather not have IFDEFs piled up on top of each other. I guess what we need is to have a more generic way of adding optimisations.
If you want to do optimisation on the Graphics library it'd be great - but ideally try and optimise it such that it benefits everyone, not just one specific use case that you have to compile custom firmware for.
-
• #24
I updated my copy of
lcd_spi_unbuf
with the optimisation suggested by Gordon. I did the comparison fordrawImage
again and thelcd_spi_unbuf
version is now as fast if not faster than my original driver. It’s good as I can now save heap by dispensing with the chunk buffer.@fanoush - yes, I had a look before - it’s really neat.
I am developing a module for the ST7735S display on the MStick-C and have managed to get fairly reasonable performance using only Javascript. The module supports the palletted graphics
drawImage
routine to avoid having to do complete screen updates - I will post how to access this soon.To reduce the overall storage requirement, images are unpalletted and written to the device in chunks. I want to reduce the amount of buffer allocation and use only one permanently allocated buffer which holds a chunk of 16bit pixel data to write to SPI in one operation. The problem I have is that the last chunk will in general be a fragment and so I would like to only write a part of the chunk buffer. I tried using ArrayView as follows:
This does not work as b.buffer is in fact
g.chunkbuf.buffer
and consequently, the spi.write sends the whole chunk buffer rather than part of it. I have had to resort to allocating a new buffer for fragments as invar b =new Uint16Array(remnt)
. This allocates a new buffer each time which I am concerned may lead to problems with storage allocation due to fragmentation. Have I missed something obvious as to how to solve this? Ideally, I would likespi.write()
to have a size parameter.