-
-
-
I think, I have made an interesting observation:
I'm greping now for XON, XOFF, UART and 'characteristic write' in adb logcat.
When everything is working fine, I can see many lines of 'characteristic write' writing to the UART. In between that, I see some flow control messages sending XOFF and then XON quite quickly after that.
So far, the timeout issue has reproduced itself only once while adb was connected. In that case, I saw XOFF being sent as one of the last messages with no XON following. The lines 'characteristic write' stop after that, the Bangle goes to timeout, but the http requests are still handled and logged by gadgetbridge.
I'm wondering whether the issue is that in case that a bluetooth connection loss falls exactly into the time between XOFF and XON, the XON gets lost and this is what freezes the communication gadgetbridge -> bangle. The XON is possibly never repeated. The larger the data volume, the more flow control kicks in and the more likely it is to run into this issue. Maybe a possible solution would be to send XON automatically on every fresh bluetooth (re-)connect?
Another observation: I'm printing the exception message of Bangle.http(...) to the display and when the Bluetooth goes out of range, I always get a "Timeout" exception first, which changes after a few seconds to "Bluetooth not connected". So it seems that the Bangle needs some seconds to understand that the Bluetooth connection has been fully lost, it doesn't realize it instantaneously. I'm wondering whether this could be the moment, during which XON gets lost.
-
-
Hi Gordon,
thanks for your quick reply!
I will connect it again to adb and watch out for XON/XOFF or something about the write characteristic. I'll let you know.
Yes, after about 20% of automatic reconnects it doesn't work at all and can only be restored by a manual reconnect. Sometimes it is necessary to wait for 30s in the disconnected state before reconnecting and sometimes gadgetbridge even needs to be restarted. If the issue appears after a reconnect, the timeouts start to happen immediately after the reconnect, there is no time in between, where it works.
-
Thanks for the link! That is interesting information.
However, I have indeed always been using the official Android Integration, which should handle this automatically. Here are some new insights, which have a learned in the meantime:
- The strange characters are definitively gone after removing Console.log(). The stability improved, but the issue definitively still occurs.
- The timeout issue can also be reproduced with just the Hello World server, calling it once per second.
- I have been calling E.getErrorFlags() regularly to see if an error gets set at the time when the issue occurs. This was not the case. Is it correct that in this way I can exclude that I get issues due to too much data?
- When the timeout issue on the Bangle is present, I can still see in adb that all requests are properly executed by gadgetbridge and the the responses are in the adb logs too.
- While the timeout issue is present, I have made an interesting reproducible observation: When I press the "Find Bangle" button in gadgetbridge, nothing happens. On the contrary, when I activate "Find phone" on the Bangle, the phone rings. Under normal conditions, both ways work, but with the issue being present, the direction gadgetbridge->Bangle is not working.
The last observation makes me think, whether it is possible that in my case the UART is sometimes somehow down completely, but only in one direction. Does this make any sense? The bangle goes to timeout because it never gets the response, although gadgetbridge has the response and thinks that it is sending it. But I'm not sure how to track this down further.
- The strange characters are definitively gone after removing Console.log(). The stability improved, but the issue definitively still occurs.
-
Just a short update:
Within the error handler, I had been calling Console.log(... and very likely, the ignored characters were coming from that and your suspicion that they are IDE characters makes a lot of sense. The characters actually have no direct relation to the issue of running into the timeout error with no https server timeout. I have removed the Console.log now.
After removing that, it's not only that the IDE characters are gone, I surprisingly also observe that the overall stability improved a lot and the timeout issue happens more rarely now. It still does happen though, but it is much harder to reproduce now.
Does it make sense that printing debug output may somehow confuse the communication with gadgetbridge and one should just avoid printing any human readable stuff there?
-
Thanks - that is really interesting and a very important insight that these chars actually have a meaning.
The webpage is a json string from my internal home automation system and contains <2000 ascii chars.
For better reproducibility and to check, whether the size plays a role, I'm now making a test with the "Hello World!" test server https://pur3.co.uk/hello.txt and I'll report whether the issue also occurs with this very short string.
I have already realized that it's a battery issue, my battery life is around 2 days with 1 request/sec. I'll leave it this way for stress testing, but you are right and I should decrease the rate in favor of battery life at some point...
-
-
Hello,
I'm calling Bangle.http(...) from the Bangle every 1 second. After the memory leak on android has been resolved, it works much more stable now, however, I regularly run into another issue:
The Bangle sometimes runs into a strange state, where Bangle.http(...) systematically raises the "Timeout" exception, although the https server definitively never times out and returns immediately. The bangle seems to get into this state especially, when the bangle and the android phone automatically reconnect after a connectivity loss. It is a spurious problem and in most automatic reconnects, all is fine. Whenever the system is in the erroneous state, a manual disconnect and then connect from within gadgetbridge (global android Bluetooth can remain enabled during the reconnect) always fixes the issue for some time.
My suspicion is that after the automatic reconnect, some scrambling on the Bluetooth UART link occurs occasionally. Maybe there is something, which is reinitialized when manually reconnecting, but not when automatically reconnecting?
I have looked at:
adb logcat|grep gadgetbridge|grep UART
Whenever all is fine, I see repeating (personal data removed from the request and the response):
06-10 01:59:31.378 28485 28503 I nodomain.freeyourgadget.gadgetbridge.service.devices.banglejs.BangleJSDeviceSupport: UART RX LINE: {"t":"http","url":"https://...","id":"79638536497"} 06-10 01:59:31.386 28485 28503 I nodomain.freeyourgadget.gadgetbridge.service.devices.banglejs.BangleJSDeviceSupport: UART RX JSON parsed successfully 06-10 01:59:31.508 28485 28485 I nodomain.freeyourgadget.gadgetbridge.service.devices.banglejs.BangleJSDeviceSupport: UART TX: GB({t:"http",id:"79638536497",resp:"{...) 06-10 01:59:32.351 28485 28502 I nodomain.freeyourgadget.gadgetbridge.service.devices.banglejs.BangleJSDeviceSupport: UART RX LINE: 06-10 01:59:32.382 28485 31310 I
When it is broken, I see this kind of line regularly:
05-25 22:09:43.726 11962 12934 I nodomain.freeyourgadget.gadgetbridge.service.devices.banglejs.BangleJSDeviceSupport: UART RX line started with 13 - ignoring
The 13 seems to be random, it changes. The personal data reported in the RX/TX debug lines is still there, but it looks incomplete and random parts are missing.
Thank you in advance for letting me know what you think.
-
-
I made a test and ensuring the queue.stop(); queue=null; before creating a new request queue resolved the memory leak.
Many thanks again for finding this.
My test was quite dirty however, I think the release of memory should be done in the response and error handler functions to allow for interleaved requests. I'm not confident enough with android development, so maybe the android developer could add this and commit it to one of the upcoming releases?
-
-
Hello,
I am calling regularly Bangle.http("https://...") from my Bangle.js 2. For testing, I do it at a rate of 1 request/sec, while ensuring not starting a new request before the previous one returns.
After establishing the Bluetooth connection to gadgetbridge, it works fine for some time, but after some minutes, issues occur. These are:
- random bluetooth disconnects
- unresponsive hanging of Gadgetbridge
- bluetooth reconnect feature not working reliably
I have enabled USB-Debugging and tried to track the memory usage of gadgetbridge:
watch -n1 adb shell dumpsys meminfo com.espruino.gadgetbridge.banglejs
and I see that the "TOTAL" memory is increasing constantly at a rate of about 1MB/s.
Is it possible that there is a memory leak in Bangle.js gadgetbridge and has somebody else seen this behaviour already? How can I track this down further?
I would appreciate greatly any help.
Many thanks!
- random bluetooth disconnects
Thanks for the offer, yes, then let's wait 1 or 2 weeks until you can send it with the backlight assembly.