-
@Gordon, same fix as me, thanks for the continued support. This has been a massive step forward and the board has been running all night with one socket broadcasting UDP data from the serial port and another one receiving an incoming TCP stream. I think this will help a number of other libraries (MQTT etc) to appear way more stable. Nice work !
The only other small one I had to catch was quite subtle so I will report it. Occasionally the AT module only receives single bytes at a time and at the end of one of these periods there is sometimes a character missing. A typical sequence looks like this
"\r\nRecv 31 bytes\r\n\r\nSEND OK\r\n"
"\r"
"\n"
"O"
"K"
"\r"
"\n"
">"
" "
"\r\nRecv 52 bytes\r\n\r\nSEND OK\r\n"and everything is fine but sometimes the last \n is missing and the sequence looks like
"\r\nRecv 31 bytes\r\n\r\nSEND OK\r\n"
"\r"
"\n"
"O"
"K"
"\r" <-- missing \n
"OK\r\n> "The logic of the parser then breaks as everything is split on \r\n. My (hacky) solution was to check for the situation within the AT module as follows, but it seems to be hiding an underlying issue.
ser.on("data", function cb(d) { //console.log(JSON.stringify(d)); if (line=='\r' && d.charAt(0) != '\n') line = "\r\n";
I have seen other situations with a UART that will miss just the last character when receiving data. I have tried running the Wifi serial at everything from 115200 all the way through to 921600 with a similar result.
Posting more for your information than anything else as I can fix it in my implementation.
Regards,
Steve
-
Awesome ! The simple fixes are the best and I really appreciate your support.
Not wanting to to drag this out but I am sure you want to make this 'networking' as robust as possible. I see at least one 'edge' case that would result in a tight infinite loop right now (which I assume locks pretty hard on this platform). It's an easy fix I think and I am happy to do it my end but probably best you maintain your own code base.
The handlers loop in the
ser.on("data"
function is now a while loop which checks against the string returned from the handler. There could be cases (like the current IPDHandler in the wifi code if it's not complete) where the same line is returned back to the function which would loop indefinitely I think.Probably just a simple precondition vs result check with a break would solve it and if the handler didn't consume anything then it must be OK to continue ?
Thanks again, Steve
-
Hi @Gordon,
Sorry that this is dragging on, for me adding the new files (AT.js and EspruinoWifi.js) results in not being able to do simple sends. No stack trace obviously but it looks like the
AT+CIPSEND
gets to the"OK"
clause in EspruinoWifi.js and the callback is returned and assigned to the lineCallback in AT.js.At this stage (after the lineCallback and before
if (handled&&dataCount) return cb("");
) the variable line equates to"OK\r\n> "
. This is then substring'd withline = line.substr(i+2);
to"> "
and the final statement of the loopi = line.indexOf("\r\n");
will return -1 and the loop exits.I think this leaves the ESP8266 module waiting for data and therefore not sending anymore and we have effectively discarded the incoming "> ". If I add a check in the end of the loop as follows then it works but obviously this has no place in the AT module
if (line=="> " && lineCallback){ //debugger; lineCallback(line); }
I am trying to work out what I have missed but the files are straight cut and paste from your recent commits. Thanks for your ongoing help, this module is in a very important part of a project I am working on and we are very keen to get this working.
Steve
-
Hi,
Looking at this fix (haven't tested it yet) it looks very simple but doesn't feel quite right. Are you sure it will handle the more standard situation with the "> " arriving in the next data after ("OK").
Maybe there is something in the underlying native code that needs updating as well ? Do I need a new firmware to go along with the changes in AT.js and EspruinoWifi.js ?
Thanks, Steve
-
-
Hi,
I have spent more time on it and there is definitely some issues around the parsing inside the
ser.on("data"
function in the AT module. You will no doubt come up with a better fix than me, my brief attempts felt kludgy at best.It does feel like the parsing is slightly fragile although the constraints of the environment and the language don't make it easy. Thanks for looking at it, I think it might be the cause of a few issues for others at times so well worthwhile.
Thanks, Steve
-
Hi,
I should have consulted my notes from a year ago, this problem is linked with an issues I raised here previously (http://forum.espruino.com/conversations/304601/#comment13692804 ) about sockets being closed erroneously during a 'send'. I did some work on this and had comms with @Gordon but then had to move on to other projects.
The problem will occur very quickly if you open one socket and start sending data in a loop whilst you receive data on another socket.
The EspruinoWifi code for sending via the AT commands of the ESP8266 has a final else statement that catches anything other than "OK", "SEND OK" , "Recv" or "Busy s..." and in that event nulls that socket.
if (d=="OK") { at.register('> ', function() { at.unregister('> '); at.write(data); return ""; }); return cb; } else if (d=="Recv "+data.length+" bytes" || d=="busy s...") { // all good, we expect this // Not sure why we get "busy s..." in this case (2 sends one after the other) but it all seems ok. return cb; } else if (d=="SEND OK") { // we're ready for more data now if (socks[sckt]=="WaitClose") netCallbacks.close(sckt); socks[sckt]=true; } else { // <- this clause here shouldn't fire but it does !! socks[sckt]=undefined; // uh-oh. Error. at.unregister('> '); }
It all looks OK as the IPDHandler should deal with anything else BUT the reality is that 'd'(the data provided in the at.cmd callback) often contains incoming data meant for the other socket. It looks like data is being 'lost' within the Espruino stack rather than being lost packets from the ESP8266.
In my case the incoming and outgoing data will use checksums so some corruption is OK and to simply keep the link up and running I have amended the code as follows
} else { console.log("Send error, received :" + d); if (d){ var ok = d.indexOf("SEND OK") if (ok !== -1) { //ipdHandler(d.substring(0, ok)); if (socks[sckt]=="WaitClose") netCallbacks.close(sckt); socks[sckt]=true; return; } if(d.indexOf("> ") !== -1){ at.unregister('> '); } } return cb; //socks[sckt]=undefined; // uh-oh. Error. //at.unregister('> '); }
This is a pretty ugly hack but it proves the issue and I have had the link running all night when it previously wouldn't last a few minutes. I tried passing the data back to the ipdHandler (commented line above) but it doesn't 'repair' the incoming data.
One important observation is that the erroneous buffer always contains the 'end' of an incoming 'line' meant for the other socket. It looks very much like we are missing the start of the IPD sentence coming from the ESP2866. I will try and debug at the AT module level but hopefully it gives @gordon something to go on.
-
Hi,
I will extend the timeout (I am fairly sure I have tried this already). I have also checked the socket number and it's the same on both occasions (when initially opening the connection and subsequently on the reconnect). I would have thought the library would have returned an error in the event of being starved of socket resources ?
My feeling is that the socket 'end' event is erroneous in some way;
My next attempt will be to build a small desktop 'proxy' so that I have more control over the other end of the link. I am also trying to capture some Wireshark logs but they are never easy to debug when the event is intermittent and hard to force.
Thanks for your thoughts
-
Thanks @allObjects. I have rewritten the socket code many times to try and make this work but the result is always similar. I have refactored things now to be more inline with the guide (although the API reference shows a similar pattern to the one above). Here is the exact code of my running application
function connectNmea() { var client = require("net").connect(NMEA_OPTIONS, function(socket, err) { if (err) { console.log("Connection error: "+err); setTimeout(connectNmea, 5000); return; } client.on('end', function() { console.log("Nmea socket closed"); EventBus.clear('gpsData'); setTimeout(connectNmea, 5000); }); EventBus.subscribe('gpsData', function(data) { client.write(data); }); }); }
The EventBus is a simple implementation that queues messages in a simple pub / sub pattern. The EventBus.subscribe('gpsData',... code is simply pulling data out of a queue from the Serial Port. The server is a separate application that I cannot alter but I have stress tested it and it will handle hundreds of concurrent connections without difficulty.
Debugging the state of the client when the 'end' event is called it looks like
=Socket: { "type": 0, "#onconnect": function (socket,err) { ... }, "opt": { "host": "192.168.1.12", "port": 10110 }, "conn": false, "#onend": function (data) { ... }, "dSnd": "", "cls": true, "endd": true }
and then 5 seconds later in the reconnection it appears to be properly initialized
=Socket: { "type": 0, "#onconnect": function (socket,err) { ... }, "opt": { "host": "192.168.1.12", "port": 10110 }, "sckt": 2, "conn": true }
but will immediately be closed again and won't sent data. Ignoring the 'end' and leaving the code still running will report (obviously!)"function called from system Uncaught Error: This socket is closed"
I agree that blind socket reconnects don't always make sense but the link is still up the server is still running and the code doesn't call for a disconnect. Happy to do any amount of debugging but very keen to make this work. Thanks
-
Hi,
Using an Espruino Wifi with the 'EspruinoWifi' module running Firmware 1.99.
The question may expand into other issues but directly my question is around an un-recoverable socket disconnection. I am using the 'Net' module and connecting to a simple Tcp Server. All works well until the socket appears to be closed (either from native code or the module itself but NOT from my code)
This might be a transient socket disconnection for whatever reason but the main problem is that I can't reconnect. Subscribing to the socket.on('close') event works and I have tried a simple reconnect via a timeout function. This function runs and the callback passes an empty 'err' object and a complete socket as it does first time round but no data is received and the 'close' event is triggered again instantly.
My code looks like this
function connect() { console.log("Attempting connection"); client.connect(MY_OPTIONS, function(socket, err) { if (err) { console.log("Its all gone bad and here's why: "+err); return; } console.log("Connected to My Server "); //my socket.on('data', .... function lives here socket.on('close', function() { console.log("socket closed"); setTimeout(function() { connect(); }, 5000); }); }); }
In summary, it works great until the socket closed event gets called. The 2nd connect runs with no error and return a socket object but no data is received and the close event fires immediately.
Any thoughts on how to recover from the socket disconnection or maybe I can do something to prevent it. Thanks
-
-
Hi,
I haven't been using Espruino for a few months but just purchased a couple more EspruinoWifi units.
Documentation shows 'require("Wifi");' as the correct module import and I am sure this worked even a few days ago. That module is no longer available and the IDE shows 'Module Wifi not found".
The code runs if I substitute require("EspruinoWifi"); but I am checking that this is correct as I remember there being subtle differences in the 2 libraries in the past.
Thanks very much
-
Firstly huge kudos to the developers of Espruino; it's fairly mind blowing being to able to easily and quickly prototype systems on fast hardware with such a simple and well thought development process.
I have come across issues trying to get a reliable send / receive system using MQTT. The problems do not come from the MQTT libraries themselves and I think I have tracked the issues to the EspruinoWifi module which is causing the socket to close during a 'send' cycle.
The following code
send : function(sckt, data) { if (at.isBusy() || socks[sckt]=="Wait") return 0; if (socks[sckt]<0) return socks[sckt]; // report an error if (!socks[sckt]) return -1; // close it //console.log("Send",sckt,data); var cmd = 'AT+CIPSEND='+sckt+','+data.length+'\r\n'; at.cmd(cmd, 10000, function cb(d) { if (d=="OK") { at.register('> ', function() { at.unregister('> '); at.write(data); return ""; }); return cb; } else if (d=="Recv "+data.length+" bytes") { // all good, we expect this return cb; } else if (d=="SEND OK") { // we're ready for more data now if (socks[sckt]=="WaitClose") netCallbacks.close(sckt); socks[sckt]=true; } else { socks[sckt]=undefined; // uh-oh. Error. at.unregister('> '); } }); // if we obey the above, we shouldn't get the 'busy p...' prompt socks[sckt]="Wait"; // wait for data to be sent return data.length; } };
attempts to send data using the well known CIPSEND, OK>, sendMyData, SEND OK pattern of the ESP8266 and if there is no incoming data during the send period then everything works fine.
The issue comes in that the callback data will sometimes contain other incoming data (say from an +IPD call) from the RX buffer in which case the final else clause (line 24) determines there has been an error.
I have tried various attempts to better synchronize calls to the SEND function and 'lock' any receive events but haven't been successful.
What does anyone think about simply returning the callback in any situation that is outside the normal flow ?
I have tried it as follows
} else { //socks[sckt] = undefined; // uh-oh. Error. at.unregister('> '); return cb; }
and it allows the socket to stay running but it still doesn't feel quite right. With a more explicit error clause (not sure what that looks like from the ESP2866 when it fails to send) it seems a workable solution and has allowed me to send and receive asynchronous (publish and subscribe independently) with decent size messages at upwards to 10Hz.
Any thoughts on whether it's a valid fix or whether there are better solutions ??
Thanks and regards,
Steve
Hi, I will dig out a USB-TTL converter and try my other UART code but I am I right in thinking that there is no exposed Rx line to the ESP8266 on an Espruino Wifi board ?
The code wasn't really a proposed fix, I think it helps explain and debug the problem more than anything. I wouldn't want you to bounce around with the AT handler, this is a genuine exception created elsewhere. For sure I can catch the error and close / reopen the socket and it's not stopping me making progress.
One interesting observation is that I have written a small conditional debug log for when this happens and in several hours of running I have had around 100 of these situations and every one of them has happened during the OK response to an AT+CIPSEND. Don't spend too long on it !
Regards, Steve