RAK8212/BG96 tcp close socket anomaly

Posted on
  • Gordon,

    further to the previous thread regarding the RAK8212/BG96, here's a dump of the sequence-

    |  __|___ ___ ___ _ _|_|___ ___
    |  __|_ -| . |  _| | | |   | . |
    |____|___|  _|_| |___|_|_|_|___|
             |_| espruino.com
     1v99.206 (c) 2018 G.Williams
    Espruino is Open Source. Our work is supported
    only by sales of official boards and donations:
    http://espruino.com/Donate
    >Turning Cell on
    Waiting 30 sec for GPRS connection
    Connecting GPRS
    ["ATE0\r\n"
    ] "ATE0\r\r\nOK\r\n"
    ["AT+CEREG?\r\n"
    ] "\r\n"
    ] "+CEREG: 0,1\r\n\r\nOK\r\n"
    ["AT+CGATT=1\r\n"
    ] "\r\nO"
    ] "K\r\n"
    ["AT+CGREG?\r\n"
    ] "\r\n+"
    ] "CGREG: 0,1\r\n\r\nOK\r\n"
    ["AT+COPS?\r\n"
    ] "\r\n+"
    ] "COPS: 0,0,\"Telstra Mobile Telstra\",9\r\n\r\nOK\r\n"
    ["AT+QICSGP=1,1,\"\",\"\",\"\"\r\n"
    ] "\r\nO"
    ] "K\r\n"
    ["AT+QIACT=1\r\n"
    ] "\r\nO"
    ] "K\r\n"
    Connected!
    >sendCloud()
    {"measurements":[{"source":{"id":"482398­6"},"time":"","type":"CurrentSensor","c8­y_CurrentMeasurement":{"Sensor1":{"value­":11.1,"unit":"A"},"Sensor2":{"value":22­.2,"unit":"A"},"Sensor3":{"value":33.3,"­unit":"A"},"Sensor4":{"value":44.4,"unit­":"A"}}},{"source":{"id":"4823986"},"tim­e":"","type":"c8y_SignalStrength"}]}
    ================================
    ["AT+QIOPEN=1,0,\"TCP\",\"11.22.33.44\",­8080,0,1\r\n"
    =undefined
    ] "\r\nOK\r\n"
    ] "\r\n"
    ] "+QIOPEN: 0,0\r\n"
    ] "\r\n> "
    ["POST / HTTP/1.1\r\nUser-Agent: Espruino 1v99.206\r\nConnection: close\r\nContent-Type: application/json\r\nContent-Length: 316\r\nAuthorization: Basic ==\r\nHost: \r\n\r\n{\"measurements\":[{\"source\":{­\"id\":\"4823986\"},\"time\":\"\",\"type­\":\"CurrentSensor\",\"c8y_CurrentMeasur­ement\":{\"Sensor1\":{\"value\":11.1,\"u­nit\":\"A\"},\"Sensor2\":{\"value\":22.2­,\"unit\":\"A\"},\"Sensor3\":{\"value\":­33.3,\"unit\":\"A\"},\"Sensor4\":{\"valu­e\":44.4,\"unit\":\"A\"}}},{\"source\":{­\"id\":\"4823986\"},\"time\":\"\",\"type­\":\"c8y_SignalStrength\"}]}"
    ] "\r\nSE"
    ] "ND OK\r\n"
    ] "\r\n+QIURC: \""
    ] "recv\",0,125\r\nHTTP/1.1 200 OK\r\nDate: Tue, 25 Sep 2018 03:56:48 GMT\r\nConnection: cl"
    ] "\r\n\r\n+QIURC: \"close"
    -->Hello world!
    ] "d\",0\r\n"
    ==> Closed.
    >state()
    =undefined
    ] "\r\n+"
    ] "QISTATE: 0,\"TCP\",\"11.22.33.44\",8080,14869,4,1­,0,1,\"uart1\"\r\n\r\nO"
    ] "K\r\n"
    >close(0)
    =undefined
    ] "\r\nO"
    ] "K\r\n"
    >state()
    =undefined
    ] "\r"
    ] "\nOK\r\n"
    > 
    

    Note that the ip address and auth have been changed to protect the innocent.
    state() and close() and my functions that issue QISTATE and QICLOSE.

    The function I used to create this:

    function sendCloud() {
      
          if (typeof (http) === 'undefined') {
          http = require("http");
        }
    //  var packet = [ { "compId": compId, "streamId": streamId,"data": data }];
      var packet = {"measurements":[
     {
      "source":{"id":CUMULOCITY_ID},
      "time":"",
      "type":"CurrentSensor",
      "c8y_CurrentMeasurement":
        {
        "Sensor1":{"value":11.1,"unit":"A"},
        "Sensor2":{"value":22.2,"unit":"A"},
        "Sensor3":{"value":33.3,"unit":"A"},
        "Sensor4":{"value":44.4,"unit":"A"}                                       }
       },
      {
      "source":{"id":CUMULOCITY_ID},
      "time":"",
      "type":"c8y_SignalStrength",
    //   "c8y_SignalStrength":
    //    {
    //     "rssi":{"value":"99","unit":"dBm"},
    //     "BER":{"value":"99","unit":"%"}
    //    }
        }
      ]};
    
      var body=JSON.stringify(packet);
      //var body =JSON.stringify("{'test':'test'}");
      
      //memoryTracking();
      
      options = {
        host: CLOUD_PROVIDER,
          port: '8080',
          path: "/",
          method:'POST',      
          headers: {"Content-Type":"application/json", 
                    "Content-Length": body.length,
                    "Authorization":  AUTH,
                    "Host":""
                   }
      };
      //print(options);
      print(body);
      console.log("===========================­=====");
      
    
       var req = http.request(options, function(res)  {
        res.on('data', function(data) {
          console.log("-->"+data);
        });
        res.on('close', function(data) {
          console.log("==> Closed.");
    //      gprs.at.cmd("AT+QICLOSE=0\r",1E3,print);­
        });
      });
      req.end(body);
    }
    
    

    We can see that close is not happening.
    My workaround is adding the QICLOSE into the on "close" method works although it hard codes the socket number.

    Another anomaly is what seems to be a size limitation on the packet which is less than the MTU (around 512 bytes??). The body is truncated and the truncated length is correct to the send request to the module, however the length in the header is the original length, so the server is waiting for more. You can see I've commented out part of the payload I send. When I un-comment it, the send size is 352 bytes even though the packet is larger.

  • Ok, I've just updated the code so new firmwares will hopefully automatically call close - let me know if that works ok for you.

    I also added some extra debugging when sending data. The splitting of data is expected - it's just to do with the buffer sizes Espruino uses internally. What's not expected is if it doesn't immediately send the second packet. I think the send function is waiting for SEND OK but it's possible it doesn't get it because it receives the > first - hopefully seeing the debug text from the new build will show that's what's happening - in which case it's an easy fix.

    Thanks for your help with this!

  • Gordon,
    the close() issue is fixed, but the sending of the remaining data seems to be lingering.

    >sendCloud()
    Length:406 {"measurements":[{"source":{"id":"123456­7"},"time":"","type":"CurrentSensor","c8­y_CurrentMeasurement":{"Sensor1":{"value­":11.1,"unit":"A"},"Sensor2":{"value":22­.2,"unit":"A"},"Sensor3":{"value":33.3,"­unit":"A"},"Sensor4":{"value":44.4,"unit­":"A"}}},{"source":{"id":"1234567"},"tim­e":"","type":"c8y_SignalStrength","c8y_S­ignalStrength":{"rssi":{"value":"99","un­it":"dBm"},"BER":{"value":"99","unit":"%­"}}}]}
    ================================
    ["AT+QIOPEN=1,0,\"TCP\",\"11.222.33.123\­",80,0,1\r\n"
    =undefined
    ] "\r\nOK\r\n"
    ] "\r\n+QIOPEN: 0,0\r\n"
    ] "\r\n>"
    ] " "
    ["POST /peppap/xxx HTTP/1.1\r\nUser-Agent: Espruino 1v99.225\r\nConnection: close\r\nContent-Type: application/json\r\nContent-Length: 406\r\nAuthorization: Basic NannyPlumRulesYWRp==\r\nHost: \r\n\r\n{\"measurements\":[{\"source\":{­\"id\":\"1234567\"},\"time\":\"\",\"type­\":\"CurrentSensor\",\"c8y_CurrentMeasur­ement\":{\"Sensor1\":{\"value\":11.1,\"u­nit\":\"A\"},\"Sensor2\":{\"value\":22.2­,\"unit\":\"A\"},\"Sensor3\":{\"value\":­33.3,\"unit\":\"A\"},\"Sensor4\":{\"valu­e\":44.4,\"unit\":\"A\"}}},{\"source\":{­\"id\":\"1234567\"},\"time\":\"\",\"type­\":\"c8y_SignalStrength\",\"c8y_SignalSt­rength\":{\"rssi\":{\"value\":\""
    ] "\r\nSEND OK\r\n"
    >state()
    =undefined
    ] "\r\n+QISTATE: 0,\""
    ] "TCP\",\"11.123.33.123\",80,14516,2,1,0,­1,\"uart1\"\r\n\r\nOK\r\n"
    >close(0)
    =undefined
    ] "\r\nOK\r\n"
    >state()
    =undefined
    ] "\r\nOK\r\n"
    >sendCloud()
    Length:406 {"measurements":[{"source":{"id":"123456­7"},"time":"","type":"CurrentSensor","c8­y_CurrentMeasurement":{"Sensor1":{"value­":11.1,"unit":"A"},"Sensor2":{"value":22­.2,"unit":"A"},"Sensor3":{"value":33.3,"­unit":"A"},"Sensor4":{"value":44.4,"unit­":"A"}}},{"source":{"id":"1234567"},"tim­e":"","type":"c8y_SignalStrength","c8y_S­ignalStrength":{"rssi":{"value":"99","un­it":"dBm"},"BER":{"value":"99","unit":"%­"}}}]}
    ================================
    ["AT+QIOPEN=1,1,\"TCP\",\"11.123.44.123\­",80,0,1\r\n"
    =undefined
    ] "\r\nOK\r\n"
    ] "\r\n+QIOPEN: 1,0\r\n"
    ] "\r\nERROR\r\n"
    ["99\",\"unit\":\"dBm\"},\"BER\":{\"valu­e\":\"99\",\"unit\":\"%\"}}}]}"
    >close(0)
    =undefined
    ] "\r\nOK\r\n"
    > 
    

    We can see on the second invocation of sendCloud() that it 'knows' about the left over data, but the send fails.

      options = {
        host: CLOUD_PROVIDER,
          port: '80',
          path: "/peppap/xxx",
          method:'POST',      
          headers: {"Content-Type":"application/json", 
                    "Content-Length": body.length,
                    "Authorization":  CUMULOCITY_AUTH,
                    "Host":""
                   }
      };
      //print(options);
      print(`Length:${body.length} ${body}`);
      console.log("===========================­=====");
     
       var req = http.request(options, function(res)  {
        res.on('data', function(data) {
          console.log("-->"+data);
        });
        res.on('close', function(data) {
          console.log("==> Closed.");
        });
      });
      req.end(body);
    
  • Sorry about that - can you try again with gprs.debug() please? It might give a bit more info.

    I think the issue is that the send callback is getting "> " before the SEND OK - so if that's the case it should be an easy fix.

  • Can you also try gprs = require("https://raw.githubusercontent.c­om/espruino/EspruinoDocs/master/devices/­QuectelBG96.js")...

    and see if that helps? I just made the relevant tweaks.

  • Since I've made tweaks to the BG96 module, I copied the changes to my module and tried it.
    The first time failed:

    {\"rssi\":{"
    [BG96] AT+QISEND response undefined  <<<<< here it stopped
    >state()
    =undefined
    ] "\r\nSEND OK\r\n"
    >state()
    =undefined
    ] "\r\n+QISTATE: 0,\"TCP\",\"11"
    ] ".123.33.456\",80,6804,2,1,0,1,\"uart1\"­\r\n\r\nOK\r\n"
    >state()
    =undefined
    ] "\r\n+QISTATE: 0,\"TCP\",\"1”
    ] "1.123.33.456\",80,6804,2,1,0,1,\"uart1\­"\r\n\r\nOK\r\n"
    ] "\r\n+QIURC: \"recv\",0,195\r\nHTTP/1.0 504 Gateway Time-out\r\nCache-Control: no-cache\r\nConnection: close\r\nContent-Type: text"
    [BG96] [
      "+QIURC: \"recv\"",
      "0",
      "195"
     ]
    ] "\r\n\r\n+QIURC: \"closed\",0\r\n"
    ["AT+QICLOSE=0\r\n"
    --><html><body><h1>504 Gateway Time-out</h1>
    The server didn't respond in time.
    </body></html>
    ] "\r\nO"
    ] "K\r\n"
    >gprs.debug()
    ={
      socks: [ undefined ],
      sockData: [
        "",
        "",
        "",
        "",
        "",
        ""
       ],
      busy: false }
    

    The second time looked like it worked as the webserver responded, so it was happy with the amount of data it got.

    {\"rssi\":{"
    ] "\r\nSEND OK\r\n"
    [BG96] AT+QISEND response "> "
    [BG96] AT+QISEND response "SEND OK"
    ] "\r\n> "
    ["\"value\":\"99\",\"unit\":\"dBm\"},\"B­ER\":{\"value\":\"99\",\"unit\":\"%\"}}}­]}"
    ] "\r\nSEND"
    [BG96] AT+QISEND response "> "
    ] " OK\r\n"
    [BG96] AT+QISEND response "SEND OK"
    ]
    

    My guess it might be a timing sensitivity with the prompt??

  • It might help to give a bit of the dump before [BG96] AT+QISEND response undefined?

    What's happening is the send function isn't getting the response it wanted with the given time period (you didn't lower the timeouts in your version did you?). Although as you say there's a SEND OK I think what comes before it is more interesting.

    My guess is that the BG96 may actually output > without \r\n sometimes (even though it appears to normally) - which means > never gets sent on to the handler.

    I assumed wrong on a change for ESP8266 extremely recently, and had to make this change: https://github.com/espruino/EspruinoDocs­/commit/7b4b448647bdf24fe22dadd15b135227­e2cc8ad0#diff-465a45343fb61c597d45aa2f89­be369b

    It's possible that if you make a similar one, it'll work.

  • Gordon, i couldn’t figure out how to graft the esp8266 code in. I did some more investigation and it seems the code doesn’t allow enough time for the send to complete and respond. Cat-nb1 is only 20kbit/s methinks, so it takes a bit longer than gsm/lte. I changed the timeout from 2000 to 10000. I’ve yet to do extensive testing to verify this.

  • Hmm - it looks like the existing code doesn't wait for > at all, and just has a 500ms timeout which isn't great at all. Also why it was hard for you to add that code.

    I've just made the changes to the QuectelBG96 on GitHub (including your timeout) so please could you give it a go and let me know how it works?

  • Initial tests indicate....... failure. Seems to be highly unreliable. I must preface this by saying I merged your changes into my code, so I may have missed something. So I've reverted the changes and are working from 225. I'm trying to get something working so I can test over the weekend, so I'll revisit this next week.

    Why was it hard for me to make the changes? I'm not the most experienced javascript programmer, so some of the idioms leave me perplexed. Like returning the closure. I wrote a little program to see what happens with return cb; vs return cb(d);
    I got the idea of registering/unregistering the > prompt, but I was unsure of the context since the two functions were slightly different. I'll get there eventually.....

    As for the changes I've made to the BG96 code, I added the retry on the CREG?/CEREG? I can send you the code to see if you want to incorporate it?

  • Without being to familiar with the detail - and going a bit on a limb - I try to answer the difference between

    returning cb; vs return cb(d);

    From variable name, I assume, it is a callback. It is for sure a function, because otherwise cb(d) would have given you an error.

    In JavaScript everything is an object... and a function, like cb, is also an object and can be passed around by reference as argument, assigned to a variable, returned as an object, etc. In the first case - return cb; - function object is returned and something else will invoke it... just as you would hand over an Icemaker vs. handing over Ice it has made, so someone else can push the button and make it make ice at their time and convenience.

    JavaScript syntax defines that a function reference (function name) followed by opening and closing parenthesis it to be invoked. In other words, cb(d) invokes the callback function with passing the d(ata) argument. With the return, you return what the callback function returns... which usually is undefined, because there is nothing around that could 'catch' what it returns.

    Typical with callbacks is that their invocation has to be deferred, because the parameter(s) that have to be passed on invocation are not available yet with the right values, and therefore the callback is handed to the next processing stage, to finally end up at the point AND time where the arguments are ready / populated with the right values and the callback can be invoked.

  • I added the retry on the CREG?/CEREG? I can send you the code to see if you want to incorporate it?

    Yes, that'd be really helpful - thanks!

  • Here it is:

      atcmd("ATE0").then(function() {// echo off
            return atcmd("AT+CEREG=2"); //turn on extended cereg response for cell tower info
        }).then(function() {
            return new Promise(function(resolve, reject) {
                var n = 60;
                var done = false;
                var i = setInterval(function() {
                    at.cmd(options.lte?"AT+CEREG?\r":"AT+CRE­G?\r",500,function(d)
                        {
                            var n = d.split(",")[1]; //1 =connected,5=connected,roaming
                            if (n==1 || n==5)
                            {
                                clearInterval(i);
                                resolve();
                            }
                        });
                    if (n-- <= 0) {
                        clearInterval(i);
                        reject("Timeout!");
                    }
                }, 1000);
            });
        }).then(function(d) {
            return atcmd("AT+CGATT=1",10000); // attach to GPRS service
    
  • I think I'm going to have to use the logic analyser to get some hard evidence as to what happens with the send function.

  • Thanks - I've just added that code in on GitHub. It should go live this week.

    Shame that send isn't working properly yet - you say it is unreliable - so it does at least work sometimes now?

  • It doesn’t work even with small packets! It looks like it gets out of sync by one line.
    I’m currently using the previous code with an extra timeout. This works around 80% of the time. I’ve soldered some wires onto the rxd and txd to the bg96 and will get some logic analyser samples. This should give us some hard evidence. I should be able to post some pix tomorrow ( my time).

  • A new day and a fresh perspective! I've hooked up the logic analyser and what seems to be happening is that the BG96 isn't receiving enough bytes only sometimes when doing the QISEND. It's expecting 536 bytes and we send around that amount. When it fails, it doesn't give the SEND OK until the next command comes along, which in my case is around 1 second later. That command fails as it gets gobbled up as data. What I am not sure of is which end is at fault. I'll hook up the CTS line to the analyser to see if the module wants us to stop or if the espruino code is not always sending what it should.

  • I've issued the command AT+IFC=2,2 to enable flow control. This seems to have made a difference. No hard evidence yet, as the defect only happens occasionally.

  • I've just tried rev 251 to see what happens. The code sends the AT+QISEND, the module responds with the \r\n> and then 1 second later, the code sends AT+QICLOSE

  • But no other activity between "\r\n> " and the close? Is there a space after >?

  • The forum gobbled my space - it must've thought it was a html tag, so yes, there is a space after the >

    My guess is the listener for the > is not firing.

  • I'm thinking your code expects an "OK" from the AT+QISEND. This doesn't happen. The sequence should be:
    AT+QISEND 0,365\r\n
    response is \r\n>[space]
    send data
    response is \r\n SEND OK\r\n

    Thus the current send function in QuectelBG96.js is borken. My current thought is there was a timing issue with the way the original code handled send but it worked most of the time. Adding extra time probably 'fixed' this. On top of this, sometimes there seems to be a discrepancy between the length of data we think we send and what the BG96 thinks it got - I've yet to narrow down on this. Setting flow control seems to help.

  • Ahh, ok. Can you try this code for send please?

      send: function(sckt, data) {
        if (busy || at.isBusy() || socks[sckt]=="Wait") return 0;
        if (!socks[sckt]) return -1; // error - close it
        busy = true;
        at.register('> ', function(l) {
          at.unregister('> ');
          at.write(data);
          return l.substr(2);
        });
        at.cmd(`AT+QISEND=${sckt},${data.length}­\r\n`), 10000, function cb(d) {
          dbg("AT+QISEND response "+JSON.stringify(d));
          if (d=="") return cb; // handle \r\n before >
          if (d=="SEND OK") busy = false;
          else {
            // probably d=="SEND FAIL"
            at.unregister('> ');
            socks[sckt] = null; // force socket close
          }
        });
        return data.length;
      }
    

    If that works, I'd be interested whether it works without the if (d=="") as well...

  • I’ll try that tomorrow. Should the busy flag be reset in the send fail condition?

  • Thanks - yes, good point!

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

RAK8212/BG96 tcp close socket anomaly

Posted by Avatar for Kartman @Kartman

Actions