-
I tracked this down to
versionChecker.js
in the EspruinoTools code. This checks for firmware version but also thatenv.CONSOLE
is one ofUSB
,Bluetooth
orTelnet
.In my case
env.CONSOLE
is equal toSerial1
. If I hack this in as an option I get fast writes -- whoop!Just wondering if this is deliberate or a bug/missing feature of EspruinoTools. I saw a comment somewhere that comms over serial has no flow control, but this is virtual serial over USB to the ESP32 and not sure if same applies.
I'm going to leave my hack in for a while and see if there are any issues. It makes a big difference to speed of dev cycle when using
espruino --watch
.Thanks
-
Sorry to jump into this thread. I just ordered a small dev board: https://www.ebay.co.uk/itm/202221693890.
Are you seeing any issues with lack of metal shield / heat sink? It occurred to me that it might run hot.
-
Hi @Wilberforce I don't know to be honest, but I would assume it's not ok to allocate all memory blocks as DMA capable, ie.
pvPortMallocCaps(size, MALLOC_CAP_DMA)
. -
Thanks @allObjects, your post gave me a clue to look at how the graphics array is allocated and it is in a flat data area, so I now use a simple
memcpy
at the start of my native "flip" routine to copy it to the DMA-capable transfer buffer. Profligate use of precious memory I know... but it does mean I can use the existingGraphics
and the only custom part is the flip.I now have a little Javascript app drawing and moving 16 rects and total execution time is around 35ms, ie. 28fps. Even this was interesting as I found things like
rects.forEach(r => {g.drawRect(...)})
, rather than a simplefor
loop based onrects.length
added about 5ms per redraw.Breadboard pic for interest...
-
I've experimented a little with the ESP32 using a custom Espruino class. I picked the ESP32 mainly because the DMA support is so much simpler. You just alloc some memory with the right flags and it handles the rest. I did look at doing this on STM32 but it's quite a bit more involved.
The main challenge was understanding the
spi_device_queue_trans
call and limitations. For the SH1106 we need to send 1024 bytes in 16 byte pages, ie. 64 sends. The ESP32 won't allow you to queue 64 transactions, so I split these into 4 sets of 16 transactions and, between each set of 16, we wait for the transactions to complete before proceeding with the next 16.Cut a long story short, I managed to send 1024 bytes in just over 1ms... see pic. This is running the clock at 20Mhz.
I'm not sending any of the command data at the moment so that will add a bit more delay -- it will double the number of transactions.
Some of the remaining overhead might be due to the mutex locks in ESP-IDF mentioned in the linked post above. The only way to remove this completely would be to bypass this layer and go direct.
I notice that the SD1780 display (and possibly SH1106 -- haven't checked) support both horizontal and vertical addressing where the pages will automatically wrap, which should mean no control data between data transactions -- and possibly being able to send all the data in a single transaction, which would be super-quick. Will try that next.
Not quite sure why I'm so obsessed with maximum speed but it's a fun journey...
I have no idea how the existing layers in Espruino could be refactored to take advantage of bulk send/DMA. It's clear to me that on ESP32 at least the graphics buffer should be allocated in co-operation with the SPI sending code (ie. the display driver). I got a bit lost following the code under
jswrap_arraybuffer_constructor
(found vialcdInit_ArrayBuffer
). This code seems to re-use allocation of strings in blocks which is clearly not very DMA-friendly. But I may have read it wrong.Thanks
-
Hi @Gordon, a DMA option sounds good longer term. I read a little about it in the ESP32 docs. You're recommended to allocate the memory for the SPI transfers with
pvPortMallocCaps(size, MALLOC_CAP_DMA)
. It should automatically use DMA if it can.Where would I find equivalent info for the STM32 / Espruino? I'm keen to do a little hacking and would prefer to start with the Pico.
It occurs to me it would be dangerous to return from the flip method while the transfer is still happening, because then the app might start modifying the memory that's still being written.
Thanks
-
I hacked in spi_device_queue_trans instead of spi_device_transmit. I got some unpredictable results at slower speeds but at 3Mhz and 4Mhz it was stable, and improved speed by about 2x. There are still big gaps though.
Attached are pics of two captures. Both are at with 4Mhz clock requested. The slower one is the current code and the faster one is using spi_device_queue_trans. Like I said I hacked it in so I don't think you could use this to send and receive data -- only send.
The faster capture shows roughly the same xfer (40ms) as software SPI so should be possible to improve on it.
I think ultimately the Espruino calling code should be changed to support multi-byte transfer. Faster updates should then be possible, thus providing more time between updates for Espruino/game code. For some scenarios the fact that spi_device_queue_trans is non-blocking might be useful - not sure it is for regular apps though as you generally want to wait for flip() to complete.
Anyway hope this is useful / of interest!
-
@Wilberforce perhaps this give a clue about the HW SPI performance issues on ESP32...?
-
Hi @Gordon tried your suggestion too and can get it up to around 700khz on the clock and the bytes sent for each scan line are looking good but there are big gaps between while the control bytes are sent... see pic (single complete page redraw all 0xFF).
The good news is that a total screen redraw is around 40ms so framerate around 25. Not so bad.
-
Hi @Wilberforce I tried your suggestion. I set it to 1Mhz after clean boot and the clock frequency is now correct but there are HUGE gaps between bytes. See pics -- the first shows 1Mhz for clock pulses and the second shows some bytes sent with the big gaps between.
-
-
I played around with the ESP32 for interest. One really obvious thing is that the baud setting doesn't seem to have any effect. It seems pinned at 100khz.
The other is that there is a big gap between each byte even at this low frequency which I guess relates to @Wilberforce comment above.
See pic.
-
20ms is definitely ok provided the Javascript isn't adding too much. I'll do some more playing around.
At 4Mbps it seemed to me there were two issues:
- At a macro level, the time taken between each scan line of data, ie. each spi.write() becomes significant compared to the time taken to send a Uint8Array (single scan line)
- At a micro level, the time taken between each byte becomes noticeable
- At a macro level, the time taken between each scan line of data, ie. each spi.write() becomes significant compared to the time taken to send a Uint8Array (single scan line)
-
Spent some time this evening and there's no real issue with the Pico SPI performance. I can get screen repaint (at least the SPI portion) down to 20ms at 800k baud. Limit seems to be around 10ms. Attached is a quick write-up. Would be good to understand if/why Graphics class is holding things up.
I will look at reproducing with the ESP32!
-
-
Am focused on the Pico, trying to see what I can achieve with this little OLED screen.
I updated the firmware - no noticeable difference. I already had the baud option set very high.
I forked and updated the driver and made your suggested code fix and... it's definitely better! Maybe 30% faster.
Can it can go faster or is this is the most we can expect? I assume the display is keeping up otherwise the signal would be corrupted?
Is there a simple way to tell if graphics is the bottleneck?
Thanks!
-
Hi Gordon, thanks for the reply. I'll give those things a try and report back. Perhaps the ESP8266 issue is still present for ESP32, because I ran up the same code on my ESP32 dev board and it was noticeably slower than the Pico, but I haven't looked into whether this is a simple SPI clock frequency problem...
-
Hi, sorry if this has been covered before. I have a SH1106 LED device attached to Espruino and it's doing buffered graphics just fine but the performance is not stellar.
Haven't timed it precisely but looks to take around 50ms to update the screen which gives 20fps.
I know a little bit about SPI and can see some discussion in issues and elsewhere about the fact that Espruino sends a byte at a time, but could be bulk sending data to avoid the handshake overhead. I also noted the comments at https://github.com/espruino/Espruino/issues/695 that memory is a constraint.
Is it the case that the Espruino can't spare 1k of data for the SPI bulk transfer (on a mono 64x128 screen)? Or would it not help if the transfer was chunked in to pieces, even if they were only eg. 128 bytes?
Thanks for a great project.
Alfie.
Hmm well so far not so good...
Get this error on second transfer -- every time. Interestingly this doesn't happen with all code. If I create a simple hello world app I can change it multiple times no problem and the transfer is ok.
Could be something to do with code size or maybe something about my app itself (it has a heavy interval timer).
EDIT: I assume ESP32 serial over USB is less capable flow-control wise than the Espruino Pico and this is simply a buffer overflow issue. I played around with the slow serial block size in the code and found 128 bytes to be a happy medium between speed of transfer and stability. Default (slow send) is 19.