-
-
@fractalf The watch supports bluetooth LE, you can make a connection without going to the official app store.
I'm not sure what you exactly want to do but all code is public.
You can clone the app store on github and use your own modified copy.
Or you could even run your own web server with a copy of the app store.But if you want more direct communcation between e.g. phone and watch that is also possible.
In that case you might want to have a look at gadgetbridge but there are undoubtedly other apps that provide functionality like this. -
@Gordon Thanks for the patch to add wrapping.
Unfortunately there is one issue with it. You cannot select 31 as a date.
Here's a pull request to fix that: https://github.com/espruino/BangleApps/pull/490Actually it is possible to create invalid dates like Feb 30 or Jun 31 but I feel it is not really worth the effort to "fix" that. Introducing the number of days per month is easy but then come leap years, someone with a date like Jan 30, moving Jan to Feb (and 30 becomes invalid).
My suggestion is not to bother about that.Edit: LOL you're quick, my pull request was already merged before I finished typing this message :-)
-
@Humpelstilzchen I think gps time does this?
@Gordon I can confirm that after a reboot I got the gps time again. Maybe it was due to the firmware upgrade that it lost the time.
I did not install default apps or so after the firmware update. Will do so later today and report back.
Meanwhile, I'm still curious whether my original suggestion is sound (and whether you want a pull request for it). -
@Humpelstilzchen I expect the configuration only to be done using min and max, as the code to set the minutes uses the same mechanism.
Wrt gps time: that is nice, but does not work that well when you are inside and need a forced reboot.
Wrt the app loader: great suggestion, I was unaware of that, thank you.@NebbishHacker I tried my changed settings on the phone by updating settings.js but that did not change things, not even after a reboot. It did not get stuck .
I'm at the latest FW version (actually what was latest last Friday somewhere during the day).
I'm indeed using the IDE, I was unaware of how to do that with apploader.js, thanks for the reference (and I did not really consider a local server but that is also definitely a possible path to pursue. -
I was a bit annoyed that when setting the time the hours did not wrap.
So when going up it would stuck at 23 instead of wrapping to 0 and when going down it would be stuck at 0.I tried to change this by modifying the Hour section of showSetTimeMenu in settings.js to this:
'Hour': { value: d.getHours(), min: -1, max: 24, step: 1, onchange: v => { d = new Date(); d.setHours((v+24)%24); setTime(d.getTime() / 1000); } },
I thought that would to the trick but to no avail.
I did overwrite settings.js on the bangle; that did not work, then I rebooted but still no luck.
I've read back settings.js to verify that it is actually the modified version and it is.
What am I doing wrong ??? -
Probably the function was inlined because I was using gcc 9
Performance impact is indeed minimal.If you want to get rid of spiFlashReadWrite you can merge this pull request
https://github.com/espruino/Espruino/pull/1850Edit: just force-pushed an update to the branch as I forgot to add an entry in Changelog
-
One last observation to report.
I noticed that aftercommit 49b4b523accc22539eb4156085cde748a208170a Author: Gordon
Williams gw@pur3.co.uk Date: Fri Jun 5 09:52:56 2020 +0100Bangle.js: Improve SPI flash speed by with specific function for reading and keeping CS asserted ( #1849)
there was only one caller left to spiFlashReadWrite, namely in spiFlashStatus.
I figured I could equally well eliminate that function and replace it with a call to spiFlashWrite followed by a call to spiFlashReadThis works like a charm but degrades the performace a bit as jshFlashRead does not inline spiFlashRead any more. Without my change jshFlashRead was the only caller of spiFlashRead and the compiler or the post linker optimizer decided the function could be inlined.
Not any more after changing spiFlashStatus to use separate Read and Write calls.It might we worthwhile to mark spiFlashRead and spiFlashWrite as inline functions. Thatwill (marginally) speed up jshFlashRead as the call to spiFlashWrite will be gone.
-
Works for me. Ubuntu + Chrome too. Ctrl+V ?
Aaargh. Yes. I right-clicked and saw no paste option.Triggered by the comments in your code snippet I decided to sync with the latest version and measure again
Indeed now unrolling also fails for me. Looking at the assembly the whole function spiFlashRead is inlined in jshFlashRead.
In the non-unrolled version the loop reading the bits looks like:45782: 2108 movs r1, #8 45784: 2200 movs r2, #0 45786: f8c3 c508 str.w ip, [r3, #1288] ; 0x508 4578a: f8d3 4510 ldr.w r4, [r3, #1296] ; 0x510 4578e: f8c3 c50c str.w ip, [r3, #1292] ; 0x50c 45792: f3c4 5400 ubfx r4, r4, #20, #1 45796: 3901 subs r1, #1 45798: ea44 0242 orr.w r2, r4, r2, lsl #1 4579c: d1f3 bne.n 45786 <jshFlashRead+0x72>
I stripped off the source as they were misleading.
The first two lines are not part of the loop, r1 is the loop counter, r2 is where the results go to.
7 instructions in the loop (including the branch)After unrolling the following 5 instructions are executed to read a bit
p_reg->OUTSET = set_mask; 457c4: f8c3 1508 str.w r1, [r3, #1288] ; 0x508 457c8: ea42 0244 orr.w r2, r2, r4, lsl #1 return p_reg->IN; 457cc: f8d3 4510 ldr.w r4, [r3, #1296] ; 0x510 p_reg->OUTCLR = clr_mask; 457d0: f8c3 150c str.w r1, [r3, #1292] ; 0x50c return ((nrf_gpio_port_in_read(reg) >> pin_number) & 1UL); 457d4: f3c4 5400 ubfx r4, r4, #20, #1
After that there is another write to set the clk high (write to reg OUTSET (0x508)
I suspect that things fail because the clock down time is too short.
The data sheet of the NRF specifies a value of 25 ns for a 50 pF load. The minimum clock down time needed by the flash chip we do not know.
Guess there is not much more to be gained here. -
-
I don't think I/O will be the bottleneck. This is bit-banged SPI so the clock is generated by software. Of course it could be that the chip introduces additional latency when writing to a GPIO pin, but generally that is pretty fast, just sending a bit to an output pin.
Wrt the test:
I'm using chromium (version 83) on ubuntu 18.04.04.
I cannot paste in the left pane of the IDE, only in the right pane.
Should I use a different browser?For the test: as I only updated spiFlashWrite I don't expect any gains as executing from flash is only read, but let me try.
Copying all but the last three lines to the right pane and selecting execute from ram gives me 7.4944 seconds.
Running the last thee lines give me 8.9745 so no gain.Let's try to also unroll the loop for spiFlashReadWrite
text size in the lst file before is 0x0005d1e4 bytes
after unrolling it is 0x0005d2d4
difference is F0 == 220 decimalrerunning the exe now gives me 8.7609, 2nd run gives 8.7582
Observations/tentative conclusions
- My execution time when running from RAM is slightly longer (about 0.05 sec so < 1%
- When running from flash is 8.97; substantially longer than your 8.79 (unless you accidently swapped digits when typing the comment of course). A reason for this could be that I compiled with gcc 9 (as mentioned in another post).
- With the spiFlashReadWrite loop unrolled I get a gain of 0.21 seconds (8.97-8.76). That would mean 2.3%. However the RAM run needed 7.49 seconds, so the original code for me used 8.97-7.49 = 1.48 seconds more when executing from flash compared when executing from RAM whereas the new code takes 8.76-7.49 = 1.27 seconds. So the I/O based speedup is about 14% ((0.21/1.48) * 100%)
While looking at the code I noticed that some more optimization opportunities. I'll look into those as well and report back.
- My execution time when running from RAM is slightly longer (about 0.05 sec so < 1%
-
Optimisations that end up not calling the function at all are probably
going to be very significant :)
Lol yes, but that does generally not deliver the desired functionality.I tried unrolling with gcc 9.3.1 and with adding
[#pragma](https://forum.espruino.com/search/?q=%23pragma) GCC unroll 8
before
for (int bit=7;bit>=0;bit--) {
The generated code:
for (unsigned int i=0;i<len;i++) { 44260: f04f 43a0 mov.w r3, #1342177280 ; 0x50000000 static void spiFlashWrite(unsigned char *tx, unsigned int len) { 44264: b570 push {r4, r5, r6, lr} 44266: 4401 add r1, r0 44268: f04f 6400 mov.w r4, #134217728 ; 0x8000000 4426c: f44f 2200 mov.w r2, #524288 ; 0x80000 44270: 461d mov r5, r3 int data = tx[i]; 44272: f810 6b01 ldrb.w r6, [r0], #1 if (value == 0) 44276: ea5f 1cd6 movs.w ip, r6, lsr #7 p_reg->OUTSET = set_mask; 4427a: bf14 ite ne 4427c: f8c3 4508 strne.w r4, [r3, #1288] ; 0x508 p_reg->OUTCLR = clr_mask; 44280: f8c3 450c streq.w r4, [r3, #1292] ; 0x50c if (value == 0) 44284: f016 0f40 tst.w r6, #64 ; 0x40 p_reg->OUTSET = set_mask; 44288: f8c3 2508 str.w r2, [r3, #1288] ; 0x508 p_reg->OUTCLR = clr_mask; 4428c: f8c3 250c str.w r2, [r3, #1292] ; 0x50c p_reg->OUTSET = set_mask; 44290: bf14 ite ne 44292: f8c3 4508 strne.w r4, [r3, #1288] ; 0x508 p_reg->OUTCLR = clr_mask; 44296: f8c3 450c streq.w r4, [r3, #1292] ; 0x50c repeat last 6 lines 6 more times (with different constant in the tst) ...
As can be seen now each iteration saves 3 instructions.
If I counted correctly the unrolled function (from entry to exit) takes 202 bytes. The non-unrolled version takes 84 bytes, so the cost is 118 bytes.
Benefit is that it saves about 5 clock cycles per bit (depends also a bit on the cycles needed for the branch). So for a byte this is 40 clock cycles or 2.5 uS. So if you write a 10k image it is 25 ms. Nice but not more than that.And of course this is not specific for blitting. Actually I did prototype this on flash write which is not relevant for blitting at all.
I assume that this applies to some other places as well (e.g. reading from flash, writing to display, ...)(Oh and I can understand fully if you do not want to take this but it was fun researching this)
-
-
I want to report my findings on the cross compiler.
The Espruino README_Building.md suggest to use gcc-arm-none-eabi-5_4-2016q3
As I was interested in the pragma for loop unrolling I decided to download gcc-arm-none-eabi-9-2020-q2-update-x86_64-linux.tar.bz2 from https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloadsThe good news: I managed to compile (for bangle.js) and the .text section decreases from 0x4e6dc to 0x4e154 bytes so a code size reduction of 1516 bytes (0x588)
The bad news I did not manage to upload my firmware.
I use Chromium, connect to my bangle, go to Settings/Flasher select Flash from File, select the zip file. I get the firmware update dialog, bangle.js. select Next. The dialog disappears but nothing happens.
What is going wrong here?(and the other bad news, much less important of course: according to the .lst file the loop unrolling pragma that I added did not have the desired effect, no unrolling was done,)
-
Eliminating the branch will save one clock cycle and up to 3 cycles to refill the pipeline.
The inner loop that is executed 8 times, consists of 9 instructions including the branch44f9e: fa46 f702 asr.w r7, r6, r2 44fa2: 07ff lsls r7, r7, #31 44fa4: bf54 ite pl 44fa6: f8c3 450c strpl.w r4, [r3, #1292] ; 0x50c 44faa: f8c3 4508 strmi.w r4, [r3, #1288] ; 0x508 44fae: f112 32ff adds.w r2, r2, #4294967295 ; 0xffffffff 44fb2: f8c3 5508 str.w r5, [r3, #1288] ; 0x508 44fb6: f8c3 550c str.w r5, [r3, #1292] ; 0x50c 44fba: d2f0 bcs.n 44f9e <spiFlashWrite+0x16>
I did not count all the instruction cycles, but the bcs instruction might well count for 10% of the time in the loop.
(the str.w instructions are one cycle so it is probably even more; but of course the only real way to find this is by benchmarking; I haven't figured out what the easiest way to do this).The other suggestions also will help but gains in the inner loop are 8 times as effective as an optimisation in the outer loop.
Afterthought: if we unroll the loop then the loading of r2 and the adds.w are also not needed.
That might well increase the gain to 20-25%.
It is too late now, but I'll try to do the counting later this week (and provide a better patch)Edit: when unrolling it might also be that the asr.w can be simplified (and if not the decrement of r7 also will be kept.
I'll try to make a version with an unrolled loop with #pragma and see how that disassembles.Wrt measuring:
What I am missing is that I do not know yet how to realise the JS to C bindings.For measuring maybe the DWT CYCCNT register can be used.
-
Hi Gordon,
Thanks for the extensive reply. I was unaware of make lst, I used cc commands that I found by running make under sh -x
Looking at the assembly unrolling the for loop would eliminate the branch at line 26. M4 has no instruction cache and no speculative execution or so, so that would help a bit.
Doing this with a gcc pragma is definitely better. I was unaware of that possibility.DMA might not be faster. I've seen situations where the setup time exceeded the time needed for bitbanging.
And unfortunately I have no debugger. I think I didn't see that option when ordering my bangle (or should I have gotten it from somewhere else?)
Note also that I am still learning about the ecosystem and software structure.
(and the reason I asked about the flash chip I hoped that its datasheet would also give info on how to drive it). -
Let me try to clarify.
What I want to achieve is a 16 bit full-screen clockface.
My understanding is that if the hand moves I need restore (at minimum) the area that was covered by the hand at the old position that will not be overwritten by drawing the new hand.
So this is not about drawing the hand, it is about restoring the background.Of course one can rewrite the old background in full: expensive.
Or you can rewrite only sections (as suggested by Abhigkar earlier in this thread) (bounding box based, but you would need images to draw)
Or you can store the background of every hand position an redraw that one (that is the 60 background images I was suggesting before, note that the background might be different when the hand is at a different position (I just want to have an image as watchface
Or you can restore the background based upon the "old hand": that is instead of rendering the hand, restore/re-render the background pixels that were covered when the hand was drawn at the previous position.Or, rephrasing the problem:
Suppose as watchface I want to have a high-res image of my kid. When the minute hand moves from say 0 to 1 am looking for an efficient way (both CPU and RAM efficient) to restore the area that has been overwritten by the hand while at minute 0 after which I can write the hand for minute 1. To do this as efficient as possible, I would like to redraw only the pixels that are actually overwritten.Did I now express my problem better?
-
Rotate is quite ok for drawing the hand.
What I wanted to say is that you might need 60 images each containing the background for each minute.I still feel it might be doable to restore background images based on the clock hand, if you have reasonably fast flash access and know where the picture data is (it does help if the data is stored continuously, so not having it scattered over the flash in different sectors; haven't studied the file system to see how the FS works).
How much time do you think is needed to read a byte from flash at a given position?
How much time to write a byte to LCD?
And of course this does require low level access to make it performant.I'll see if I can come up with some pseudocode or C code to illustrate what I want, but it may be Friday or Saturday before I get to that.
-
Hi,
I authored a version of spiFlashWriteByte that is a bit more efficient (at the expense of using a bit more rom due to loop unrolling. Appreciate feedback on this.
I'm also interested on how best to test this (I want to avoid bricking my bangle.js)Code is at:
https://github.com/FransM/Espruino/tree/spiFlashWriteByteSomething similar can be done for spiFlashReadWriteByte
(oh and the rationale for this, is that these two functions are very low level functions to access the spi flash so every clock cycle saved here pays off).
Edit: what flash chip is actually used in bangje.js?
-
I fear the amount of available memory will become the limiting factor.
drawImages might be useful, but I am worried about the complexity.
as @allObjects pointed out you may want to rewrite multiple hands (at least the minute and hour one).Also with a minute hand I might need 60 different images (unless there is rotational symmetry in the background in which case multiple hand backgrounds could use the same image to restore.
-
@allObjects updating 60% is of course not desirable.
@Gordon
The idea I was thinking of is a bit different.
What I would like to do is to use something like drawImage but instead of drawing the actual image I would like to select the pixels from another image, while using the transparency and position information of the actual image.To phrase it differently.
I would like to erase the clock hand by redrawing those pixels from the background image that were written when the clock was drawn.Or, if I simplify the drawing of the clock hand to:
for all clock hand pixels that are not transparent write the pixel to LCD
Then the restore could be
for all clock hand pixels that are not transparent write the pixel from the background image to LCD.Only pixels that are overwritten by the hand need to be restored.
So basically instead of writing a clock hand pixel we write a background image pixel to restore the background.
And after restoring the background of course a new clockhand can be drawn.Does this sound feasible?
edit: a possible call could be something like
g.restoreImage({image:fgimg,x:120,y:120,center:true,scale:0.7,rotate:1.1, pixelimg: bgimg}) -
Got redirected here by Abhigkar
I've been pondering at this as well.
With respect of the 2.2 seconds to draw a full hires bitmap:
Where does the time go? Reading the file from flash? Processing it? Writing the data to the LED controller?
I suspect the last.The solution I was thinking of is to restore only the display used by the hand (or by the graphic representing a digit). The latter is easier as the position of the hand is moving.
Idea is that when the hand moves the pixels of the old hand are restored from the background image, after which the new hand is written.
And to make it even faster only those pixel that are not overwritten by the old hand need to be restored.
I think something like rasterop/bitblt could help here, (or some lowlevel helper functions)
Key to this is easy and efficient access to the bits of the image on the flash.Of course it is a bit less trivial as I sketched since we have two hands and they can be overlapping, but I hope you get the idea.
And it is probably easier with a digital watch where the digits are actual bitmaps.
Basically it would require drawing a section from a background image.How does this sound?
-
Hi,
I was looking for a way to render part of an image file.
Idea is to draw a background image as watchface and restore only damaged sections when the clockhand moves.
I've seen the mechanism that is used in imgclock but that one requires an intermediate file which works if the area is always the same, but is not very convenient for clockhands.Ideally I would like to have some rasterop/bitblt like functionality that could render from file or memory to screen., but I am not sure how easy it is to realise that (in an efficient way).
Then again if the flash is memory mapped it should be feasible.Anyway: any suggestions on how to achieve selective redraws?
Frans.
PS: I understood from the ST7789VW datasheet that it actuall has an on-chip display data RAM of 240 x 320 x 18 bits. Not sure if our bangle uses the VW version, but if so would it make sense to make the off-screen data ram available? (I didn't go through all of the data sheet but maybe having data on the LCD controller chip could help to implement things in a more efficient way).
-
Cool, I was trying to make something like this myself, but due to lack of time didn't complete it yet.
However, whenever I want to upload the clock I select a background, click on upload and get
"Customise failed, TypeError: Espruino.transform is not a function"
This is with 2v05 firmware on the watch, chromium browser under ubuntu 18.04
I can't help you with the inline C but if your string is null terminated you do not need the len variable at all. Just use
and the if statement could be avoided by writing
This exploits that the boolean expression returns 0 or 1.
Whether this is faster is to be benchmarked as it will depend on compiler and how the code is actually compiled.