v1.97 - bug in net module or I do something wrong?

Posted on
  • Now I try to connect MQTT client to my clock project. I have found that Espruino MQTT module does not work, because its onConnect() function is called twice when it is connected to server. I have just added workaround into the function and everything is working.
    Is it a bug in net module in the version or something else?

    I can add my code here, of course, but I think it is trivial - I just call require() and then connect().
    Workaround in first line of onConnect() is to check if #onend is already registered in socket and return if it is.

  • Yes, there was a bug caused by two onconnect events. However 1v97 is over 6 months old.

    Just move to the newest version of Espruino and you'll be fine.

  • @Gordon Thank you, now I am sure it does not seem to me. I have not checked neopixel in ESP32 v2.00. I have read it was broken in v1.98 and is not working now.
    In any case the old version bug is not a problem for my clock.

    By the way, I've found one minor bug in MQTT module:
    in MQTT.prototype._scktClosed

    if (this.ctimo) clearInterval(this.ctimo);
    

    should be clearTimeout

  • Thanks - I'll make sure that gets in the next update

  • Tue 2018.11.20

    May I ask @SergeP at what link you read this:

    re: neopixel: 'I have read it was broken in v1.98 and is not working now.'

    To my knowledge, hasn't been an issue. The change log doesn't indicate a fix for something that wasn't apparently broken:

    https://www.espruino.com/ChangeLog

    and only found a 1v98 forum reference at
    'Graphics.drawString not working on 1v98?'
    which isn't directly related to Neopixels.

    I can speak from experience that I've had no issue on several versions, the latest 1v99 on authentic Espruino boards, and have built quite sizable modules pushing the limits of both code and hardware.

    Would love to know where this bad press started.

  • Neopixels are still broken on the esp32 with version 2.00.

    Since the sdk was updated to v3.0 (and also 3.1) it has not worked. There was no changes around the RMT libs used output the waveforms,so it's a bit of a mystery. @jumjum is going to connect a digital oscilloscope at some stage.

    The outstanding issue is here:

    https://github.com/espruino/Espruino/iss­ues/1484

    We have bigger issues with wifi not connecting at present with the latest sdk so will be reverting back to 3.0.x

    The ESP-idf is a moving target - we update to get new features, and then other things break. It's very frustrating!

  • @Gordon I' ve found something to improve in MQTT module.
    In some cases call of underlying require("net").connect(...) throws exception. mqo.emit('disconnected') is not called in such case. So general reconnection algorithm (reconnect on disconnected) is not working.
    So may be it is a good idea to catch the exception in MQTT module and emit disconnected event:

          try {
            client = require("net").connect({host: mqo.server, port: mqo.port}, onConnect);
          } catch (e){
            client = false;
            this.emit('disconnected');
          }
    

    It will simplify usage of MQTT module.

  • ref #6

    Thank you @Wilberforce for the detail outlining that, this is specific to ESP32 and not authentic Espruino devices. That should resolve the confusion here.

  • @SergeP thanks - I'll get that added.

  • @SergeP

    The latest Travis build for esp32 has better wifi setup - new wifi functions added to bring in line with esp8266 and @JumJum has got the neopixels to work again.

  • @Wilberforce, @JumJum, thanks a lot. I have just tried new Travis and it works well. Moreover, it looks to be more stable then 1v97, both Wifi and Neopixel.

  • Thanks for reporting back.

  • @Wilberforce, I have some strange problems with the firmware version. For example, a few times my clock begins to change screens very fast. I've checked that function inside setInterval is called many times per second (instead of one time per second) and does it with a very different intervals. I have tried to check if setInterval was called twice (while it is called only one time in my code, of course) and call clearInterval(1) from console (I have checked that 1 is correct number) and screen change stops - so it was one interval with the strange behavior. A few times I saw another problems, I can not identify them as well as previous one. Common part is that these problems is that they begins after a few minutes or an hour of device work. I have checked memory - and more then a half is free. May be, setInterval interferes with neopixel or with other intervals?
    I have another board connected to Neopixel LED sting and very simple code on it and still have no problems (last 4 hours). So tomorrow I will try to exchange these boards to check if hardware problems are. But clock worked well with old FW version on the same board.
    Now I have no idea how to locate the bug. I still have not tried to connect JTAG to ESP32 (and even have not tried to build something for ESP32) but do not think it is too hard for me. So I am going to try to (may be) search for something at weekend.

  • @SergeP, var h; and later used in: h = setInterval(...); returns a handle that you can use for clearInterval(h), because this handle - number - keeps moving... (yes it is 1 when it is the very first setInterval/setTimeout... but you do not know what other parts of the software are using setInterval/setTimeout, so your next one is something totally different.

    It is good practice to check for multiples of same setInterval, because that can really crate issues... You can prevent that by checking against that handle you get for that particular if (!h) { h = setInterval(...); }: But to be clear: the variable where you store the handle is not cleared when the interval/timeout is cleared or the timeout happens. Therefore, a good practice is to have the handle cleared like this h = clearInterval(h);respectiveh = clearTimeout(h);, and in the timeout function as first thingh = undefined;`, if it matters.

    The other thing I could think of is that some numeric / precision thing (number of significant / stored digits) and the way numbers are stored can play games with you... especially when it comes to fractions of time. Very defensive, tight and robust programming are the only way to detect that and the way out. I'm not paranoid, but somethings the absence of 'x' does not mean the presensec of '!x'... ;-)

  • Further to above - what mode are you saving in? It will also save the setTimeouts, and your code could re add them causing chaos!

    Do a reset(1); to clear all the code and possible use the ide save in send feature. This saves to flash and restarts the code fresh each time.

  • Usually I save my code using 'Direct to Flash' mode in WebIDE. I do reset(1) before.
    I have used setInterval(func,1000) at global scope in my code, and I have not clearInterval. The setInterval returns 1.
    The code works with previous FW (v1.97).
    I see some strange behavior of my code with Travis build I have got a week ago. Most often I see fast screen change (screens are swithed in the setInterval) and strange state of Wifi connection after about an hour of work. I think I can remember git hash of the build if you need it.
    Today I've tried to call changeInterval(1,1000) from console and it helps - clock returns to normal work and works 2 more hours.
    So it seems to me it looks like setImterval-related memory corruption or something similar (may be data in hardware timer corruption or somethig else). And it looks to be related to the FW version.

  • Saving status of a processor has its caveats... even with Espruino boards you can run in trouble - even though they do a much better job, because it is @Gordon's way to make things easy... for small, simple apps... If it get a bit more complicated and you have connected devices that do initialization work, saving does only save Espruino status at best but not the status of your peripherals.

    Therefore, save the code only in non-running code but with an onInit() function setup that kicks everything off (except you know the impact of what you are doing...). With that, you make sure that nothing gets missed and nothing gets doubled.

    In case you worry about the require()... their sequence does not really matter: Before the code is uploaded by the IDE, the IDE parses your code for require("<moduleName>") and uploads related modules, even nested / recursive - into the Modules Cache. And only after that, your main code / application is uploaded.

  • @allObjects, do you mean that 'Direct to flash' saving method with devices initialization at global scope has disadvantages? I think both methods lead to close results...

  • ...close is good, but not exact and possibly (obviously?) not close enough. I'm not the guru on ESP-32 implementation, I try to be safe, defensive, robust in my SW. I know hat you have it working... and I'm sure the kinks have their explanations. I try to avoid such things in the first place. Development process does not need to be inclusive of the final deploy. I have no issue to mark a mile stone that I pick conciously: now is the time when I want to power it up (IDE) unconnected.

  • I've located conditions when the bug appears.
    It happens after setSNTP call. Some intervals start to act like they have zero interval after the call. While dump() shows actual interval data is still correct.
    Broken interval list is different time to time. Sometimes all intervals become spoiled, sometimes only one or two, sometimes no one. But it happens often, each second call or about. changeInterval() call can fix broken interval.
    I've returned js implementation of SNTP to my code, and now everytrhing works.

    Now I'm sure there is a bug in Espruino for ESP32. Unfortunately, I did not find sntp implementation in git repository (while I do not think I will find the bug but I'd want to try). In any case I will be happy to help somebody to locate and kill the bug.
    I've tried to write small test code with only one interval and setSNTP but have no success - the bug does not appear in the environment, so it is not so easy to catch it.
    Could anybody also explain me correct way to report the bug? It seems to me it may be a wellknown Github feature. But I did not used Github so I do not know how it is usually done.

    By the way, I see signs that there is may be one more bug - my code hangs after a short time (usually a few hours) after I have added many console.log() calls (so it is called about 2 times per second now). There is no special console messages before the hang. It seems that the hang appears much faster if ESP32 board is connected via USB (so ESP32 COM port is used for logging). But I did not test it enough.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

v1.97 - bug in net module or I do something wrong?

Posted by Avatar for SergeP @SergeP

Actions