Getting a web page with Javascript content

Posted on
  • I realise that I am probably trying to misuse the espruino Javascript software, so please feel free to give me a hard time.

    I would like to read some status data in web pages from various devices on my network, the status pages include data collected by Javascript. The only thing I have available is an embedded computer that has an Xscale processor running Linux. I have cross compiled the espruino software, it was pretty easy, once I stopped trying to be clever. On the target device I can run './espurino --test-all', most of them pass. If I then use http.get, I can get the webpage, but without the Javascript data, indeed the web page on the device I have tested sends back a message that my browser doesn't support Javascript. Is what I am trying to do even possible with the espruino Javascript software?

    I have tried other ways to get a headless web browser with Javascript support, but they either compile to huge files or I haven't been able to get them to cross compile.

  • No problem. I'm not sure Espruino can totally do what you want (be an embedded web browser), but there may still be a way around it.

    I'm guessing a bit here because I don't know what these devices are or what the webpages look like, but I suppose what happens is the device serves up a webpage with some JavaScript on it in a <script> tag, and that JavaScript requests the data from somewhere else.

    Personally, I think the best way to get the data is to forget about executing that JavaScript code. I'd open the device's webpage on a desktop computer with Chrome, right-click, and click inspect element. You'll then get a debug screen where you can see what the webpage is doing.

    You'll probably want to go into the 'Network' tab and look at what's actually going on, but I bet you'll find that it loads some other file, maybe something like mydata.json.

    Instead of trying to load the original webpage, just load that file, and manually decode what's in it.

  • Thanks for the quick reply. I will give it a try this evening.

  • Sorry to not answer your quetion, but your question got me thinking:

    Do you control the web page as well?

    If so, then I envision a tag like:

      <script type="text/javascript.espruino">
      //Espruino code, not parsed by browser js 
      digitalWrite(LED1, true);
      //your server could call into functions on your Espruino too
      //something like 
      currentData([1,2,3,54,5,2,3,4,2,8,3,4,4,­4,4,2,3,4]);
      </script>
    

    Could be kind of neat. If the developer controls both the Espruino and the web page then this could be a way for the server to provide dynamic code for the Espruino to run.

  • One of the items is a modem and I have no control over that web page, I am trying to get the connection status and signal strength. I also have a couple of PLCs (programmable logic controllers) where I do have control of the web pages. I don't have a clue what I would want to trigger from the PLC to the espruino software running on the embedded computer, it is one of those ideas that is looking for a problem to solve. I shall file it away because you know that at some time it will get me out of a hole.

    A bit off topic, but I assume that you guys know about http://www.busybox.net/, it does for embedded Linux what you are doing for Javascript.

  • One of the items is a modem

    Ahh - I actually had to do something like this with a ZTE MF730M dongle.

    When I did what I described above I found that accessing this URL:

    http://192.168.0.1/goform/goform_get_cmd­_process?isTest=false&cmd=imei%2Chardwar­e_version2Cmsisdn%2CLocalDomain%2Cwan_ip­addr%2Cipv6_wan_ipaddr%2Cppp_status%2Csi­m_imsi%2Crssi%2Crscp%2Clte_rsrp%2Cnetwor­k_type&multi_data=1
    

    Got me all the data I needed as JSON.

    Probably won't work for you, but worth a try :)

  • Whether this is trivial or impossible depends on the specifics of how the page gets the data.

    If the JS on the page just uses an XHR to grab the data, it's trivial - ignore the page and go straight for the URL with the data. It might even return it in JSON format for you :-P

    If they're doing something dumb, it could be a much bigger deal.

  • Didn't find any files with .json, and was getting frustrated, then I spotted a file called GetPublic.xml, opened it up in Opera and it contains the data I want. I wondered if the data was only put there when the main web page is opened, but luckily it seems to be filled at regular intervals anyway. I couldn't get espurino js to retrieve the page, I haven't used javascript or espurino before so I was in deep water. However Linux wget did work. So not the method I was hoping to use, and it doesn't help me getting data from other devices, but I have an easy solution for the modem.

    I need to go away and do some serious study of javascript & espurino, I hate wasting other peoples time when I am asking for vague help.

    Thanks to all.

  • 'screen'-scraping an html page is quite challenging, because the 'embedded' javascript dynamically changes the page content. You would only succeed by doing what a browser does... and thats a lot.

    Does your embedded application/linux/computer an API that you can tap into?

    From what I understand is that the embedded application/linux/computer is the vehicle to read/collect/save data from your devices, including register/config/control them. The application has a Web interface which you open in a browser and this gives you the user interface to the data and functions.

    ...but luckily it seems to be filled at regular intervals anyway.

    This hints to me that there is some code that runs constantly/triggered on timers to collect the data and stick it into the xml file / update the xml file.

    The Web app - which may just consist of some static html and some javascript - then just reads and presents the content of the xml file. This static html and some javascript is served by some (Apache) Web server. The xml is placed in a spot where the Web server has access to.

    Do you have access to the Linux of this embedded computer? Scout for some apache / httpd / .cfg / ... directory / files. In the (apache/httpd) config you can find out if it is plain html or if other stuff is involved, such as server side logic executed with php modules. Server side logic is used to access files outside of the Web server's accessible directories. By changing the config you can give access to the xml file and pull it with what ever you want, such as Espruino with an XHR, and then do some processing.

    There is xml to json converters in javascript that can run on Espruino to give then javascript object access to the data (trees,collections).

    Javascript has its tricks up the sleeves, but approaching it with an object-oriented mind will get you quickly into some robust useful coding.

  • Nice - sounds like it's all coming together. There are some examples of making network requests on Espruino in the examples. Should be straightforward once you understand making http requests. It's pretty easy - maybe a bit counter-intuitive if you're not used to working with callbacks.

    You're typically safe when it comes to retrieving status pages like that without loading the main page (you might end up having to go through the authentication if the modem requires a login). Usually the page with the data is generated dynamically in response to the request (this is also what you'd likely do if you wanted to load status pages from Espruino, for example).

    @allObjects - You're making this complicated, and frankly, here there's no need for it. All he has to do is request the magic file that contains the status info he needs and parse it...

  • @Bryan, since a (simple) Linux wget works and delivers the magic xml file, an XHR and some processing/parsing of the response on Espruino works, just as @DrAzzy says.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Getting a web page with Javascript content

Posted by Avatar for Bryan @Bryan

Actions