• Hello,
    I am trying to read a very lengthy(approx. 15K rows) .csv file using require("Storage").read("xxx.csv").
    And the next step is to create an array for the values in the .csv file.
    However, .split("\r\n") function on require("Storage").read("xxx.csv") gives me an error - see the attached image for the error details.

    Is there a way that I could read only say first 200 rows from the .csv at a time? and then the next 200?
    I don't see any other API at the moment to read the file in chunks, except .read(), so wanted to confirm.

    Pls. note that I am using Espruino emulator and also have some other functionality coded at the moment for the smart watch.

    Thanks a lot!


    1 Attachment

    • 2021-02-13 (2).png
  • read() is OK, split() is not. You can search for end of line one by one (via String.indexOf) instead of making big array of strings.

  • As @fanoush says you can use indexOf - there's actually an example of this in the planetarium app: https://github.com/espruino/BangleApps/blob/master/apps/planetarium/planetarium.app.js#L107

      f=storage.read("...");
      var line,linestart = 0;
      lineend = f.indexOf("\n");
      while (lineend>=0) {
        line = f.substring(linestart,lineend);
        .... = line.split(',');
        linestart = lineend+1;
        lineend = f.indexOf("\n",linestart);
      }
    

    Just a bit of background: Storage.read returns a string that's basically just a pointer to external flash memory - so it can be huge. You can use it like any other string, but if you split it you're then creating potentially thousands of new strings in RAM which will use up all the available memory.

    I guess there's no JavaScript function that is the equivalent of .split(..).forEach(..) but I guess it's something we could add.

  • Thanks a lot, @fanoush and @Gordon for your replies.

    Until I received the responses here, I did try reading the file using indexOf.
    It's a file with 1 column having 17K rows. I wanted to separate these 17K values and store them in an array since I need to do analyze these values and compute something further.

    Obviously, since 17K is a very large number, I for instance tried reading only the first 500 values. (Could separate out every value from the file based on carriage return ) and could print all of them. No problem.

    However, the moment I try to store these 500 values in an array, I face the low memory problem.
    It works well if I store only 100 values in the array. However, the computation that I need to do on this data, requires at least 500 values for me to analyze it well because I am working on a pattern. Can't analyze with only 100 values.
    Is there any way I could achieve this - storing 500 values or maybe at least 300?

    Thanks a lot once again for all your help!

  • Are you storing numbers? This might help you: http://www.espruino.com/Performance#array-buffers-are-the-most-efficient-way-to-store-data

    Basically normal arrays are flexible but pretty inefficient. All you need to do is use a typed array to store substantially more data - for example if all your data could fit into a single byte, you could fit over 30,000 items in memory.

  • or you can write it back to storage as binary (or initally save it already like that instead of text csv) and then you could read it directly into array via https://www.espruino.com/Reference#l_Storage_readArrayBuffer Then you could iterate over all of them like you can with the big string now. see the Description there.

  • Hi @Gordon,
    Yes, they are numbers, actually decimal numbers - around 10 digit long numbers.
    I Will try using a typed array and see if it works
    Thank you very much!

  • Hi @fanoush,
    That's a great idea too. Did not strike me.
    Thanks a lot!
    I will try it out as well.

  • Hello @Gordon,
    I tried using TypedArray

    var arr_data = new Float32Array(300);
    arr_data[i] = parseFloat(s); // this is in a for loop where I read each value from csv as string

    since my numbers are in decimals.
    However, when I try to store the float value in the above array, it doesn't get stored as it is.
    For example: 1180.952381 from the .csv is stored as 1180.95239257812

    I tried using Float64Array and it gave a better result however not the exact value
    1180.952381 is stored as 1180.95238100000
    and 1175.824176 is stored as 1175.82417599999

    I believe using Float32Array would be better considering memory usage, however not sure why the values aren't getting stored in the original form

    Thanks!

  • From https://en.wikipedia.org/wiki/Single-precision_floating-point_format "This gives from 6 to 9 significant decimal digits precision"
    From https://en.wikipedia.org/wiki/Double-precision_floating-point_format "The 53-bit significand precision gives from 15 to 17 significant decimal digits"

    you have 10 digits so won't fit into float32, as for float64 10 digits should fit but still not every value can be represented exactly, see e.g. https://stackoverflow.com/questions/12165216/which-values-cannot-be-represented-correctly-by-a-double

  • Float64Array uses exactly the same number format as JavaScript uses internally, so you'll find that there is no loss between just parseFloat(s) or putting the value in an array. It's just the way computers work.

    Also, the value may actually be pretty much ok, but Espruino is just not converting it to a string in exactly the same way as you wrote it. For example 1180.952381 and 1180.95238100000 are basically the same number, just with some extra 0 on the end.

    Another option you have is if you always have 6 decimal places, just store the value as an integer in a Int32Array (parseInt("1180.952381".replace(".","")) and then divide by 1000000 in whatever maths you do.

  • Forgot to add - @fanoush's is a great idea as well. Especially if you write in binary to begin with, you can end up with very fast array access but without using any RAM.

  • Thank you so much @fanoush and @Gordon for the explanation.
    @Gordon : the values doesn't always have 6 decimal places, it could be at times 5 or at times none.
    1180.952381 and 1180.95238100000 are the same number, however, I was just curious to know why values like 1175.824176 get stored as 1175.82417599999 or with extra zeroes at the end.

    Thanks for all your help as always!

  • Wed 2021.02.17

    post 13 'I was just curious to know why . . . '

    @NewAtEspruino while I don't have a solution to your initial inquiry, I am able to provide a link to a wealth of other links that should assist in your curosity discovery.

    The simple answer is found in the differences and complexities that result when using both
    Base2 and Base10

    I noticed in post #1 that text chars ref: \r\n and indexOf() provide a representation of what a microprocessor actually stores in it's Base2 equivalent. Humans view a representation of a number that we know as a numeral 0 - 9 (a String element or Char representation of a Base10 numeral) while the underlying mechanism relies on a 'Charateristic' and 'Mantissa' (Floating Point) that allows the magic of binary numbers to work.

    Had to revisit this topic over a year ago and included many links and examples to how floating point works under the hood. Note in post #4 (there) the number of digit differences between browsers and Espruino

    Number.toFixed() not rounding

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

.split() on a very lengthy string gives LOW MEMORY error

Posted by Avatar for NewAtEspruino @NewAtEspruino

Actions