[Bangle.js 2] Seemingly random freezes / unresponsive?

Posted on
Page
of 2
/ 2
Next
  • Is anybody else currently dealing with seemingly random crashes and freezes?

    Occasionally, multiple times a week (most days, rarely more than once a day), my Bjs2 will, without warning, become unresponsive. This almost exclusively happens on the clock face, if only because I'm usually in the habit of closing apps whenever they're open. I haven't been able to figure out how to reproduce this behavior consistently, but I'm trying to troubleshoot regardless.

    Current details:

    • I have a decent number of apps installed. I suspect it could be a memory leak?
    • The clock face in question is the LCARS one from the app loader - specifically the one with brighter colors, not Stardate Clock. It's currently in full screen mode, no widgets.
    • Said clock face intermittently gives me a "DISK" warning. Going to App Manager and running "Compact" seems to alleviate this issue. I have not yet measured the effect of doing this regularly w/r/t preventing these freeze incidents.
    • I'm connected to the BJS fork of Gadgetbridge. No weather or other online apps set up so far.
    • No suspicious battery drains.
    • I'm living in the east coast United States and the past few weeks it's been 90F / 32.2C out consistently. I'll take note of the core temp reading (if I can find a widget for it) to see if it could be an overheating issue. The most recent two freezes occurred outside and in direct sunlight; and right next to the fan on my laptop. I'm considering this a strong possibility.

    So... I'm not totally sure what to do to figure it out but I guess I have a few ideas.

    In addition to logging temp readings, I'm going to switch to a different clock face for the next few days and see if it happens less frequently - perhaps it's actually just a problem with the LCARS app... which would be disappointing seeing as I plan on creating a similar feature-rich clock face soon. But I guess we'll see.

    Anyway, does anybody else have similar experiences?

  • I sometimes see similar behavior. Mostly when connected to Gadgetbridge and listening to music/podcasts for at least half an hour. Disconnecting and reconnecting the bangle from Gadgetbridge "revives" it. Seems to have no bad side effects on further use. Auto-open for music is switched off.

  • I'd say definitely try a different clock face, but it could also be related to apps with boot code/widgets.

    intermittently gives me a "DISK" warning

    How often does that happen? That would imply that some app is writing to Storage very frequently (it'd have to in order to use up 8MB of memory so quick). And that could well be an issue.

    When storage gets too low, a 'compact' pass is run automatically. As I'm sure you know when you've run it manually, it can take quite a while - could it be that's what is happening in your case and the Bangle is just sitting there unresponsive for ~90 seconds or so while it is compacting itself?

    I'm connected to the BJS fork of Gadgetbridge. No weather or other online apps set up so far.

    When it happens, please could you go to Debug, then Fetch Debug Logs. It'll write a file into Internal Storage/Android/data/com.espruino.gadgetbridge.banglejs/files (which you could easily see if you plug your phone into a computer) - and maybe you could see if there were any interesting-looking messages shown in that? I wouldn't post the complete thing up as it might include the content of your notifications though.

    I sometimes see similar behavior. Mostly when connected to Gadgetbridge and listening to music/podcasts for at least half an hour.

    That is interesting - I'll give it a try here and see if I can reproduce. Next time it happens, it'd be great if you could check the log to see if there's anything in it.

  • I have seen this as well in the past few weeks. I've also noted that the Gadgetbridge app (Bangle.js version) sometimes also has crashed when the watch has locked up. Haven't had time to debug yet, and I cannot get the app to save a debug log (nothing happens when I press Fetch Debug Logs)...

    I'm using "A Configurable Analog Clock" as my watch face.

  • nothing happens when I press Fetch Debug Logs

    Ahh, it might be you're on an old version of Gadgetbridge then?

  • Ahh, it might be you're on an old version of Gadgetbridge then?

    I don't think so...

    Version 0.68a-banglejs
    Commit 14db0d7df

  • To get the debug logs from Gadgetbridge you can (edit: apparently not the same, see Gordons response below) first activate logging by:

    Open the sidebar -> 'Settings' -> Check 'Write log files'.

    Then share the log:

    Open the sidebar -> 'Debug' -> 'SHARE LOG' -> Choose where to send the
    log. (I send it to my laptop with KDE Connect, but you can of course
    use an app on the phone)

  • Hm... I have been trying that too, but no dice. I'll do some more testing.

  • nothing happens when I press Fetch Debug Logs

    Actually, It doesn't pop up a window or anything (maybe just a quick 'toast' message) - it just writes the file.

    To get the debug logs from Gadgetbridge you can first activate logging by ...

    Actually those are different logs. The ones I'm suggesting show just the communications between the watch and Gadgetbridge

  • Actually, It doesn't pop up a window or anything (maybe just a quick 'toast' message) - it just writes the file.

    In this case, when I say "nothing happens" I mean that no file is written (I check the folder).

    But I'll try to take a closer look at a logcat, maybe try to reset things, etc...

  • Actually those are different logs.

    Ok, good to know!

  • I had the same problem. I am not sure what triggered it. I suspected a Bluetooth disconnect during app upload, but that was just a guess.

    I have performed a factory reset which solved the problem for several weeks. It occurred again today. I have not uploaded any apps today, but I have compacted storage today. This may have been a coincidence though.

    I also use Gadegetbridge (latest version) and I use the apps from https://espruino.github.io/BangleApps/

    I have attached the device info from https://espruino.github.io/BangleApps/?c= (maybe we can find a pattern)


    1 Attachment

    • IMG_20220727_203135.jpg
  • After I did the factory reset a few weeks ago everything worked fine for a few days, but then my watch would seemingly freeze again. Disconnecting it from the phone in Gadgetbridge reliably cured the condition.

    Then I did not use the watch for a week and did not connect it to my phone either. The watch did not freeze a single time.

    When I connected the watch to my phone, I experienced occasional freezes again.

    Two weeks ago I changed my phone from a Fairphone 2 (nearly 7 year old hardware) to a Fairphone 4 (recent hardware) and I did not experience any freezes since then.

    This led me to the conclusion that the cause of the freezes may be connected to the Bluetooth connection. I think the connection was quite unreliable with my old phone.

    Is it possible that an unreliable connection causes a deadlock on the watch when the connection is lost during data transfer between Gadgetbridge and the watch?

  • Sorry for the delay replying here... That's really interesting! It might explain why it's been very hard to pin down.

    Is it possible that an unreliable connection causes a deadlock on the watch when the connection is lost during data transfer between Gadgetbridge and the watch?

    It shouldn't happen but I guess there is some small chance. I don't think there is anything in Espruino itself that would suffer however I guess there is a chance it's something at a lower level in the bluetooth stack that is causing problems.

    If I had a way to reliably reproduce this here then I could get one of the watches set up on a hardware debugger and might have more of an idea what's going wrong.

  • It happens to me randomly every few days or weeks too. The watchface is frozen with some old time still shown. Button does not work except holding it for the watchdog reboot - that 'fixes' it. I do use gadgetbridge too. Did not think about relation to connection as it happened when the phone was near (over night, both on table next to each other) and it worked fine many times when connection was lost during day but still it can be related.

    Was thinking about debugging it over SWD but didn't get to it yet. For that I think I'd need my own firmware build installed with matching debug symbols to see exact source line where it hangs including variable values and call stack so that is what I did not do yet. Will try to make such build, install it and wait.

  • Was thinking about debugging it over SWD

    It'd be awesome if you could do that.

    No offence but I'm quite glad it's happening to you - since I think by now you know the Nordic bluetooth stack better than I do!

  • No solutions yet, but I did narrow in on what might be causing it - it's definitely a gadgetbridge issue.

    I had my watch disconnected for over a week, and I'm 95% sure it didn't freeze that entire time. I reconnected it this morning and already had to reboot it once. Temperature doesn't seem to be a factor nor does disk space - compacting and removing apps has no apparent effect on the phenomenon.

    I forgot to enable debug logs previously but I'll just did, and I'll respond with info if I find anything promising.

    My current hypothesis is that it happens whenever too many things happen at once - specifically, whenever it's called upon to load something when a "loading" box is already on the screen. A "loading" box seems to appear 2/3s of the time it crashes. My thinking: is it possible that a notification coming in at the same time as a scheduled timer is what's causing the crash?

    Like I said... I'm 90% sure it exclusively happens whenever notifications are active.

    My branch is about 600 commits behind, though, so perhaps I should update before doing any more digging? Either way, it's a busy week for me. Not sure when I'll get around to looking into this more.

  • I do have watch in this state for many days but did not figure so far out how to find out what is exactly wrong with it. It is not frozen it just does not react to anything but it still works and saves power so I think the idle javascript loop still runs fine. Looks like it is in some high level logical 'lock' or bad state. When it 'froze' gadgetbridge was still connected over BLE and stayed being connected without reporting any error. I disconnected it like hour after it froze and it disconnected just fine, again without reporting that anything is wrong with the connection, however it just could not reconnect later. After one day the screen went blank (may be some screen blanking code?).

    The trouble with using SWD debugger is that with bluetooth stack enabled you cannot stop the CPU and then resume because BLE stack will immediately crash due to strict timing needed. So I have one shot and then need to reboot the watch. Also when Espruino works and is in idle loop/sleep and I halt the CPU it is in some interrrupt context and I don't see regular stacktrace where I could see how C function calls are nested and could examine local C variables on different levels. When I keep it running I can just read memory like global C variables but don't see what it is doing. I have few ideas - I need some live profiler that could sample running code (CPU registers and stack/RAM) without stopping it, should be doable somehow but I did not figure out details how to do it with openocd/gdb. Also I can make custom firmware with some extra logging and extra debug code and wait with this for next freeze. I am planning to implement javascript console over SWD (another object of javascript Serial type like Bluetooth) and then I hope I can switch console input/output at runtime it this 'frozen' state. Maybe I could still run some javascript in this state.

  • Also since the gadgedbridge did not report any issue maybe the connection still works so if there was a way to inject some simple js code over existing connection in this frozen state (Like running 1+1 or process.memory() and see whether it still responds would be interesting.

  • It is not frozen it just does not react to anything but it still works and saves power so I think the idle javascript loop still runs fine.

    So does it update the time in that state? It's that it doesn't respond to the button presses?

    I am planning to implement javascript console over SWD

    This would be awesome! I wasn't sure how to get started with this, but if you have a proof of concept for what to do on the nRF52 I could probably integrate it pretty quick.

    if there was a way to inject some simple js code over existing connection in this frozen state (Like running 1+1 or process.memory() and see whether it still responds would be interesting.

    I've been meaning to implement something like https://www.espruino.com/ide/relay/ in the App Loader (so it could be used from within Gadgetbridge) - that might really help here - although if debugging Gadgetbridge in Android Studio I believe you can get it to send data.

  • So does it update the time in that state?

    no, time is frozen on screen, no reaction to anything except that for me the screen was blanked next day and battery lasts for very long (like normal). And holding button will trigger watchdog reboot => without holding button the watchdog is obviously still pinged so it is still alive.

    This would be awesome! I wasn't sure how to get started with this, but if you have a proof of concept for what to do on the nRF52 I could probably integrate it pretty quick.

    I think I already mentioned it somewhere but there is Segger RTT technology - basically small C code linkable to Espruino that does memory buffer as i/o more info https://wiki.segger.com/RTT the code itsef is e.g. here https://github.com/adfernandes/segger-rtt

    So from JS/Espruino it could be standard console object writing to circular memory buffer. From the other side it can be used from openocd with some swd debugger but I am thinking about some SWD library as Inline C + maybe some JS around it so you could use another espruino with 2 gpios to connect to it. Basically the device would then act as usb to segger rtt i/o 'serial' adapter so you would connect IDE or putty to it and you would get bangle console. Specific espruino device I have for this is the ~$2 STlink V2 clone dongle (aliexpress) https://github.com/fanoush/EspruinoBoards/tree/master/STLINKV2

    Someone even had basically same idea with Forth and com over swd here https://mecrisp-stellaris-folkdoc.sourceforge.io/swdcom.html but they did not use segger rtt and they used that dongle 'as is' with original stlink firmware and some custom code talking to it https://github.com/Crest/swdcom . In theory same could be also done with CMSIS-DAP dongles (same ~$2 thing on aliexpress in many variants) which uses USB HID protocol so no need for custom OS driver with that one. But I think something acting like USB to serial would be easier to use with IDE.

    EDIT: started separate conversation for this https://forum.espruino.com/conversations/380926/

  • so it is still alive

    also the frozen state showed blue icon so gadgetbridge was still connected and as mentioned gadgetbridge on the phone did not complain about connection being broken in any way, I had to disconnect manually. So maybe the watch could be just out of memory or maybe it could have entered javascript debug> prompt

  • Gadgetbridge nightly should have a fetch logs button under debug that you could run, to see the transcript of data to/from the Bangle and see if anything had been written (like debug>) that might give a clue...

  • Gadgetbridge nightly should have a fetch logs button under debug that you could run,

    is this in the playstore version now? I have just installed your playstore version and there is "fetch device debug logs" button in Debug menu but clicking it has no feedback and then I see there is Android/data/com.espruino.gadgetbridge.banglejs/files/gadgetbridge.log with lot of android logging but I don't see anything new there when clicking the fetch button.

    However I do see UART RX and UART TX lines when the device does communicate - like when clicking other test buttons to simulate calls and messages.

  • I'm afraid this did go in the Play Store version, and then someone pushed a change to Gadgetbridge which broke it! It should now be fixed, but I don't think that fix has made it into Play Store yet (On Nov 1st Google introduced some new restrictions which I have to work around before I can update it!).

    However I do see UART RX and UART TX lines when the device does communicate - like when clicking other test buttons to simulate calls and messages.

    This should still be good for debugging I'd hope - but it's not as clear as the log that would have been produced.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

[Bangle.js 2] Seemingly random freezes / unresponsive?

Posted by Avatar for morganwable @morganwable

Actions