Corrupted code?

Posted on
  • Hey,

    I've been running an indoor temperature/humidity/pressure sensor for a month or so, logging to an SD card and also sending via NRF24 to a Raspberry Pi for logging in a database.

    Recently, it's started glitching, with errors like:

    Uncaught Error: Function "gunction" not found!
     at line 1 col 73
    ...21D COLLISION"),setTimeout(gunction(){b-null)},0),a._tto=!1,...
    
    Uncaught Error: Function "kunction" not found!
     at line 1 col 73
    ...21D COLLISION")-setTimeout(kunction(){b-null)},0),a._tto=!1,...
    
    Uncaught Error: Function "console3log" not found!
     at line 1 col 30
    {var a=this;if(a._tto)reuurn console3log("HTU21D COLLISION")...
    

    with single character "typoes" appearing, disappearing, changing.

    It started off absolutely fine for a few weeks, and then an occasional failure every few days (requiring a simple reboot with the RST button), but now it starts glitching as soon as it's connected. This is while running from a PC, but also while running off a known-good power supply, or even a LiPo.

    Is it possible that the Espruino's flash is wearing out, or is something else going on?

  • Wow, that's an odd one. And the errors are different each time?

    The data itself is loaded off flash at boot and the flash isn't touched again, so I don't think that'll be the problem. Are you using one @DrAzzy's bigram builds, or just a normal one?

    It would seem to be a problem with the RAM. Does the chip get warm? it might indicate that some IO is shorted, and over time that is causing a failure.

    The other thing to check is do you have a decoupling capacitor (a ~0.1uF ceramic) across the NRF24's power lines? Someone suggested recently that the NRF24 generated quite a lot of noise on the power supply lines, and that was causing the NRF24s to latch up occasionally. It wouldn't surprise me if all that noise could cause problems for the STM32 as well, so it might be worth adding something and seeing if that helps.

    I'm not sure why it'd be getting worse, but it could the lower ambient temperature is making the LiPo battery higher resistance, and so that is making the voltage fluctuate more.

  • Yeah, the errors are different each time and they don't happen on every hit. This script runs on a 1 second setInterval. It's difficult to qualitatively judge the frequency of the errors because it's uneven, but it fails more often than not. Sometimes it's enough to stop the script running altogether, requiring a RST.

    This is on one of the original Kickstarter boards. I haven't noticed it getting warm, but I'll keep an eye on that though.

    I've been running this from an external USB power supply primarily (Anker 40W 5-port) powered from a UPS (just because it's there). Right now, however, it's running from a powered hub (D-Link) so I can keep an eye on what the console's saying.

    I'll hunt down a capacitor for the NRF24, but in the meantime I've set it to run without the NRF24 to see if it still happens.

    Thanks @Gordon!

  • Oh, and while on the powered hub, Vbat is about 4.5V. I haven't checked it when running from LiPo or from the Anker yet.

  • I noticed a similar thing when connected to GPS or LCD or both. It feels like someone is poking around in memory - for sure not my code, because I do not use (yet) those instructions and I'm aware of the risk they pose. Reloading the code helped. Since I was in development I did not pay much attention an attributed it to incorrect / run-away code.

    Since I wanted to test-run my GPS connection and line processing for days, I tried to catch any irregularities in try-catch blocks... but somehow I could not get it working. The errors - a GPS sentence field was unexpectedly empty - just let's the thread die and luckily, the next on.data will keep the processing (re-)going.

    @tom.gidden, I'm glad you brought this up, so I feel less 'begin nuts...' or 'seeing ghosts...' ;-)

  • I'm not sure what to suggest - if we can get a piece of code that exhibits the problem without extra hardware then I can debug it here, but otherwise I'm not sure what the issue might be...

    I've had an Espruino board running in the self-opening bin for must be 8 months non-stop now, and it's never had issues. That isn't with the latest firmware though.

  • I didn't see a response to this... Are the Espruinos running into these problems using the bigram builds, or stock firmware?

  • Mine is 1.3 with 1v70 stock software. I noticed it when something when something was going on on the board (serial on data) and other setTimeouts were active, and I just sent new code to the board. That was the trigger for me to have always a disconnect or disable option in place 'to quiet the board' before sending updated software. I this case my experience/context of code corruption is differ from what @tom.gidden experienced.

  • I haven't found an appropriate capacitor for the NRF24 yet, but since disconnecting it and commenting out the init code two days ago, errors have ceased. Now, that might mean that the NRF24 noise hypothesis is correct, but it could be down to a few other reasons too, involving hardware and/or software. I'll update when I manage to sort out the capacitor thing.

  • I found a 0.1µF ceramic and put it in place, but it didn't help. I also tried a 1µF.. no luck. However, I did notice something interesting...

    Uncaught SyntaxError: Got ?[219] expected EOF
     at line 1 col 13
    {var a=this;Ûf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[224] expected EOF
     at line 1 col 13
    {var a=this;àf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[229] expected EOF
     at line 1 col 13
    {var a=this;åf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[234] expected EOF
     at line 1 col 13
    {var a=this;êf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[239] expected EOF
     at line 1 col 13
    {var a=this;ïf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[244] expected EOF
     at line 1 col 13
    {var a=this;ôf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[249] expected EOF
     at line 1 col 13
    {var a=this;ùf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[254] expected EOF
     at line 1 col 13
    {var a=this;þf(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    Uncaught SyntaxError: Got ?[3] expected EOF
     at line 1 col 13
    {var a=this;g(a._tto)return console.log("HTU21D COLLISION")...
                 ^
    in function "getTemperature" called from line 2 col 60
    ...re(function (x) { T2 = x; });
                                   ^
    

    Notice that the bad character code is increasing by 5 on each error. Once it reaches 255, it loops around and increments the next character (f to g) and carries on. Weird. This is with stock 1v70.

  • There are ancient techniques where - on execution - (simple) executors/interpreters write stuff to the code at intended places, such as markers / return/forward addresses addresses (for returns/if-then-elses) etc. If the addresses for these intended places are off, this stuff mutilates the code. This (ancient) techniques make code though non-reentrant (for same or other threads) - see http://en.wikipedia.org/wiki/RCA_1802. DCP1802 had no stack, so the return address was written at the begin of the subroutine, and the return had to pick up this address to continue just where the call / invocation jumped of. Btw, the DCP1802 had also a BASIC interpreter. (I'm amazed that the data sheet http://www.intersil.com/content/dam/Intersil/documents/cdp1/cdp1802ac-3.pdf is from 2008... my experience with the chip and its programming is from the 70'...)

  • @allObjects luckily Espruino doesn't do anything quite as 'interesting' as storing data in the code that it is executing :) We'd talked about your problem before, and it's just missed characters rather than corrupted ones - now you mention on('data',...) I'd assume that the data coming in down the Serial port is filling up the input buffer. Maybe talk about that in another thread? It may be that reset() isn't disabling Serial ports correctly and they are still reading data.

    @tom.gidden sure sounds like a software fault then. My guess would be that there's some bug caused by the recent changes to use smaller blocks for variables, and something is accessing past the bounds that it should be. It's just strange that it only seems to happen occasionally...

    Do you think that it's possible that it is happening all the time a bit of code is being executed, but that the bit of code isn't being executed very often? Do you think you could send me your code? If I can find a way to reproduce it, it'd be a lot easier to track down :)

  • Thanks, will do. First I'll try to isolate it down to a simple test case.

  • Hi Tom,

    I just discovered that I merged a pull request back in march that disabled all the sanity checks in the debug version of the code (that I use to run all the tests on). After turning them back on it seems there are a bunch of failures now, so that would more than explain the problems you might be having.

    I'll try and sort this out next week, but I wouldn't be surprised if it fixed the corruption you're having.

    Sorry about that - I should have checked stuff more closely.

  • I've just updated the code in GitHub - it now runs all the tests without assertion failures.

    There was one issue in particular that could have caused this instability, so maybe when 1v71 comes out (or you could try a pre-release) you could try this again and see if it works any better?

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Corrupted code?

Posted by Avatar for tom.gidden @tom.gidden

Actions