How to troubleshoot an assertion failure

Posted on
  • I'm getting an assertion failure after running some MQTT stuff for a while on an esp8266. The failure is: ASSERT FAILED AT src/jsvar.c:556, which is assert(jsvGetLocks(var) < JSV_LOCK_MAX); in jsvLock. Is there a simple way I can add a line of code to print the name of the variable?

  • That's a good idea - I guess you could add a new macro - ASSERT_AND_PRINT(jsvGetLocks(var) < JSV_LOCK_MAX, jsvGetRef(var)) or something, that would print the second arg if it failed.

    jsvGetRef is really what you need. Then you can look it up in the memory trace that should be printed after the assertion... Trying to work out the variable name is possible by scanning all variables, but a huge faff and I don't think there is any built-in function for it.

    Otherwise the easiest way is if you have a debugger, or can reproduce it on the PC.

    (Just going to move this to ESP8266 - I don't think it's really JS related... not in the way most folks will be expecting :) )

  • I added a call to _jsvTrace(var,0,var,0) and I get:

    ASSERT(jsvGetLocks(var) < 15) FAILED AT src/jsvar.c:572
      #1[r1,l2] Object {
        #2[r1,l2] #601[r5,l15] Object {
      #602[r1,l2] #601[r5,l15] Object {
      #602[r1,l4] ...
    
      #605[r1,l2] ...
    
      #604[r1,l3] ...
    
      #608[r1,l2] ...
    
      #78[r1,l2] ...
    
      #464[r1,l2] ...
    
      #480[r1,l2] ...
    
      #473[r1,l3] ...
    
      #493[r1,l2] ...
    
      #784[r1,l2] ...
    
    }
    

    Not sure what to make of this...

  • I added a stack backtrace. It's not a real interpretation of the stack, it really just looks for addresses in the stack that are within code and then looks up the symbols, so there are spurious entries. The result is:

    0x40209361: jsAssertFail at ??:?
    0x4020d21f: jspeFactor at ??:?
    0x40202e2c: jsvLockAgain at ??:?
    0x40202e56: jsvLockAgainSafe at ??:?
    0x4020c598: jspeFunctionCall at ??:?
    0x40201190: jslSingleChar at jslex.c:?
    0x40201b44: jslGetNextToken at ??:?
    0x4020b6c1: jspeBinaryExpression at ??:?
    0x4020d427: jspePostfixExpression at ??:?
    0x40202024: jslMatch at ??:?
    0x4020b6ec: jspeAssignmentExpression at ??:?
    0x4020bcd4: jspeExpression at ??:?
    0x4020d389: jspeFactorFunctionCall at ??:?
    0x4020e355: jspExecuteFunction at ??:?
    0x402055ea: jsvIsStringEqualOrStartsWith at ??:?
    0x402044f9: jsvLock at ??:?
    0x4021045c: jsiExecuteEventCallback at ??:?
    ...
    

    The fact that this happens in an event callback makes sense to me. Dunno where to go from here though.

  • Usually when there's an assertion it does a full trace, starting from global (execInfo.root)... Not sure why it isn't doing it on ESP8266? It might help you track down what the var actually was.

    I'm not sure why the trace you have isn't giving any names though - it seems pretty messed up.

    Since it's called from jspeFactor it's presumably in JS code, so you could find out where you are by typing jslPrintTokenLineMarker(jsiConsolePrintS­tring, 0, lex, lex->tokenStart, 0)

    By guess is that somewhere, some code is locking a variable and then not unlocking it.

    The assertion could be anywhere that the variable is used (not the code actually causing the problem), because obviously the number of locks just ticks up over time, and then it just gets to the point where any subsequent lock of that var will fail.

    Still, tracking down which variable it was will mean you can start to look through the bits of code that access that variable, and hopefully at some point you'll find out what isn't unlocking.

    How did you get the stack trace? Is that something you did in code on the ESP8266 itself? Seems like it'd be a really handy debug tool

  • So I tried v1.85 at 66fde09 and that doesn't crash. I will try the official v1.86 next.
    The stack trace of assert fails for multiple reasons on the esp8266. First of all, it writes to the JS console. That's over telnet for me, so will never get there. Even when switched to serial it doesn't work because it jams the characters into the device, which then needs the idle loop to get drained, which also doesn't happen 'cause the assertion causes a reboot before that. The reason it's all messed-up is that I hacked the reboot code to first drain the console but I think some limited buffer gets overrun. So multiple things to fix.

    The exception stack dump is esp8266 specific. The commit is https://github.com/espruino/Espruino/com­mit/30aae54c80b6958d22447fc76f6d15d4c9da­81c0 (probably some follow-up changes too).

  • Well, looks like 1v86.3 works fine. So either it's something on master since then or something I messed with that causes the issue. All this is using the exact identical JS code.

  • The latest commit built by travis on my fork works too, so must be one of the small local changes I have or my local build env. Grrrr. I'll try the jslPrintTokenLineMarker anyway, it'll come in handy in the future...

  • I'm continuing to have issues with assertions. I now isolated it to a small fragment of code. It doesn't fail on pico. Any thoughts what may be causing it? The code is:

    
    function MCP23008(i2c,rst, i2ca) {
      if (rst) {
        rst.write(0);
      }
      this.i2c = i2c;
      this.i2ca = (i2ca===undefined) ? 32:32+i2ca;
      this.wt = i2c.writeTo.bind(i2c, this.i2ca);
      if (rst) {
          this.rst=rst;
          this.rst.write(1);
      }
    }
    
    
    function foo1(a) { console.log("foo", a); }
    dofoo1 = foo1.bind(null, 55);
    dofoo1.call();
    

    The assertion failure is: ASSERT FAILED AT src/jsvar.c:624

    The stack trace is:

    0x40282e59: jsvRef at /big/home/src/esp8266/Espruino/src/jsvar­.c:624
    0x402867ed: jsvCopy at /big/home/src/esp8266/Espruino/src/jsvar­.c:2053
    0x4028c733: jspeFunctionCall at /big/home/src/esp8266/Espruino/src/jspar­se.c:618
    0x402a566b: jswrap_function_apply_or_call at /big/home/src/esp8266/Espruino/src/jswra­p_object.c:866
    0x4028a182: jsnCallFunction at /big/home/src/esp8266/Espruino/src/jsnat­ive.c:183
    0x4028c6d4: jspeFunctionCall at /big/home/src/esp8266/Espruino/src/jspar­se.c:576
    0x4028d27a: jspeFactorFunctionCall at /big/home/src/esp8266/Espruino/src/jspar­se.c:1155
    0x4028d340: jspePostfixExpression at /big/home/src/esp8266/Espruino/src/jspar­se.c:1422
    0x4028d420: jspeUnaryExpression at /big/home/src/esp8266/Espruino/src/jspar­se.c:1449
    0x4028b600: jspeBinaryExpression at /big/home/src/esp8266/Espruino/src/jspar­se.c:1576
    0x4028b615: jspeConditionalExpression at /big/home/src/esp8266/Espruino/src/jspar­se.c:1612
    0x4028b62d: jspeAssignmentExpression at /big/home/src/esp8266/Espruino/src/jspar­se.c:1676
    0x4028bc04: jspeExpression at /big/home/src/esp8266/Espruino/src/jspar­se.c:1683
    0x4028c1b0: jspeStatement at /big/home/src/esp8266/Espruino/src/jspar­se.c:2307
    0x4028d6e8: jspeBlockOrStatement at /big/home/src/esp8266/Espruino/src/jspar­se.c:1736
    0x4028dcfa: jspParse at /big/home/src/esp8266/Espruino/src/jspar­se.c:1745 (discriminator 1)
    0x4028e154: jspEvaluateVar at /big/home/src/esp8266/Espruino/src/jspar­se.c:2494
    0x40291277: jsiHandleNewLine at /big/home/src/esp8266/Espruino/src/jsint­eractive.c:1284
    0x4029163b: jsiHandleChar at /big/home/src/esp8266/Espruino/src/jsint­eractive.c:1447
    0x402916a2: jsiHandleIOEventForConsole at /big/home/src/esp8266/Espruino/src/jsint­eractive.c:1653 (discriminator 2)
    0x4029173c: jsiIdle at /big/home/src/esp8266/Espruino/src/jsint­eractive.c:1677
    

    "Help"...

  • Tweaking the code some more, the doone call works fine, the dofoo call crashes:

    function x(a) {
      this.a = a;
    }
    var goo = new x("hello");
    
    x.prototype.foo = function(a, b) { console.log("foo", this.a, a, b); };
    
    setTimeout(function() {
      doone = goo.foo.bind(goo);
      doone(60, 12);
    }, 1000);
    
    setTimeout(function() {
      dofoo = goo.foo.bind(goo, 55);
      dofoo(10);
    }, 2000);
    
  • Wow, thanks for narrowing that down... And this works on the Pico with latest firmwares? It doesn't seem to have anything platform specific in there at all.

    It's definitely something to do with the function parameters based on where it crashed... Are you using <1024 variables? If so, it would go to 12 byte mode and I guess there could be some kind of bug in that.

  • This uses 1600 jsvars. It crashes with a version pretty close to master as well as an older v1.85. I'll try pico again now that I have less code. I'm wondering whether jsCopy & native code & flash somehow interact differently on the esp8266.
    (Just tried pico again and it works fine there, grr.)

  • The latest build off your v1.86 works on esp8266, so it must be something in my workspace that is messing things up. Go figure... Sorry for wasting your time.

  • Aha, if I build locally with RELEASE=1 it works, if I build without it if crashes. Gonna try a pico build without RELEASE=1 next...

    Update: the pico build I had travis make doesn't boot on pico. Ugh. I'll have to pursue this later.

  • I remembered I could try linux. Works there with and without RELEASE=1. On esp8266 it doesn't really work with RELEASE=1, that just removes the asserts so it doesn't crash on assert. I just doesn't work in a different way. Sigh.

  • I'm continuing to have assertion failures. A different JS program now hits assert(jsvGetLocks(var) < JSV_LOCK_MAX) in jsvLockAgain. I added code to call jsvGetPathTo before the assertion hits and in that function it hits the same assertion in jsvLock. Most likely jsvGetPathTo tries to lock the same var as it traverses the hierarchy. Is there any other way to get information about which variable this is and/or where in the JS code this is being encountered? Is there a way to print the JS callstack?

    Update: I added a decrement of the lock before calling jsvGetPathTo so it doesn't hit the assertion failure. Now it prints: ▒.HttpCC["0"]. After some poking around, I conclude that that's my MQTT client connection.

  • Well, a bunch of hours of tracing the call chain and determining where the lock on the connection/socket gets incremented this is what I've found out:

    • socketClientConnectionsIdle line 484 calls socketClientPushReceiveData
    • socketClientPushReceiveData 440 calls jswrap_stream_pushData
    • jswrap_stream_pushData 80 calls jsiExecuteEventCallback
      the number of locks is incremented by one when jsiExecuteEventCallback returns
    • jsiExecuteEventCallback 1558 calls jspExecuteFunction
    • jspExecuteFunction 2542 calls jspeFunctionCall
      ...

    I can see clearly that every time I receive an MQTT message my on("data") event handler gets called and when socketClientConnectionsIdle is abouit to return the lock count on the socket is bumped up by one. So after about 15 messages that's it, the assertion fires.

  • Wow, thanks - that's a huge help... I'll try and take a look at this.

    The Pico non-release build doesn't fit into available flash - but you can disable some of the extra features like AES to get it to fit in...

  • Also, jsvGetPathTo could probably just use jsvGetAddressOf and not lock at all?

  • Fixed! https://github.com/espruino/Espruino/com­mit/4fba95ec89c36b5e337ccbe1c0973b405f81­b24a

    If it helps, when debugging I did:

    • Ran under linux with gdb (makes life loads easier!)
    • p jsvGetRef(var) to get the reference of the variable that had a problem
    • break jsLock if ref==that_reference
    • break jsLockAgain if var==the_vars_pointer
    • break jsUnLock if var==the_vars_pointer
    • break where it's happening and after, and step through finding where the locks and unlocks happen
    • When I found out there wasn't an unlock in the function that locked it, the only way it could have happened is if thisVar got overwritten, so I did watch thisVar and continued until I found where it was getting overwritten

    But none of that would have happened without a reproducible test that ran on Linux and the hints about which variable had too many locks - so thanks! The hard part was definitely already done :)

  • Awesome. Yeah the hard part was to find out whereabouts it was it was failing so I could construct a test case to reproduce it in linux. Thanks for the prompt fix!!

  • Fix is looking good, thanks again for the promptness!

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

How to troubleshoot an assertion failure

Posted by Avatar for tve @tve

Actions