reading parts of a large data file

Posted on
  • Good morning!

    Somewhere I read that a statement such as

      let fullFile = Storage.read('DataSet');
    

    would just copy a reference to the data in flash - and not transfer the whole file.

    However, when doing so on my device, I get a LOW_MEMORY error. Am I wrong with my assumption?

    If so, it would be nice to get an extended read operation with the following signature

      Storage.read(file-name, offset, length)
    

    in order to read just parts of a file (similar to Storage.write)

    Using StorageFile instead is not an option as such files do not support binary data (because 0xFF is used internally)

  • Sat 2020.01.11

    Could it have been 'DataView' rather than 'DataSet'?

    https://banglejs.com/reference#DataView


    'Somewhere I read that a statement such as'

    Will need the source link to better understand and respond. . . .

  • Storage.read docs agree with wath you say: "This function returns a String that points to the actual memory area in read-only memory, so it won't use up RAM."
    Most likely what you do next uses all the memory. Here is an example that creates a 1000 long file, read-s it, and free memory doesn't decrease by a lot (CPU is bare nRF52832, same as in the Bangle):

    >var st = require('Storage')
    =function () { [native code] }
    >process.memory()
    ={ free: 2466, usage: 34, total: 2500, history: 8,... }
    >
    >st.write("a","Hello",0,1000);
    =true
    >var readback = st.read("a")
    ="Hello\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\­xFF\xFF\xFF\xFF" ... "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xF­F\xFF\xFF\xFF\xFF\xFF\xFF\xFF"
    >process.memory()
    ={ free: 2463, usage: 37, total: 2500, history: 19,... }
    >readback.length
    =1000
    >process.memory()
    ={ free: 2463, usage: 37, total: 2500, history: 22,...}
    
    /// write some at the end, and read back a slice of it:
    
    >st.write("a", "Hello at the end", 1000-16);
    =true
    /// the last character:
    >readback[999]
    ="d"
    >readback.slice(1000-16, 1000)
    ="Hello at the end"
    >process.memory()
    ={ free: 2455, usage: 45, total: 2500, history: 45,...}
    

    You can force it to take up memory by slicing into it for example:

    >var thisIsInMemory = readback.toString()
    ="Hello\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\­xFF\xFF\xFF\xFF" ... "\xFFHello at the end"
    >process.memory()
    ={ free: 2445, usage: 55, total: 2500, history: 61,... }
    >var thisIsInMemory = readback.slice(0,1000)
    ="Hello\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\­xFF\xFF\xFF\xFF" ... "\xFFHello at the end"
    >process.memory()
    ={ free: 2361, usage: 139, total: 2500, history: 66,... }
    >var thisIsInMemory2 = readback.slice(0,1000)
    ="Hello\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\­xFF\xFF\xFF\xFF" ... "\xFFHello at the end"
    >process.memory()
    ={ free: 2275, usage: 225, total: 2500, history: 71,...}
    >var thisIsInMemory3 = readback.slice(0,1000)
    ="Hello\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\­xFF\xFF\xFF\xFF" ... "\xFFHello at the end"
    >process.memory()
    ={ free: 2189, usage: 311, total: 2500, history: 76,...
    

    On each invocation memory decreases by 86 vars, and the 12 byte of string per jsVar int the Performance section.
    86*12 = 1032, there is a bit of overhead for history and storing the actual variable.

    You can also try readArrayBuffer if you want to work with binary data. That doesn't eat up memory right away either:

    >var ab = st.readArrayBuffer("a")
    =new Uint8Array([72, 101, 108, 108, 111,  ... 101, 32, 101, 110, 100]).buffer
    >process.memory()
    ={ free: 2186, usage: 314, total: 2500, history: 80,...}
    > 
    

    Oh, and you can get the size of things:

    >E.getSizeOf(readback) // returned by `read`. Just a reference to flash
    =1
    >E.getSizeOf(ab) // returned by `readArrayBuffer`. Just a reference to flash
    =2
    >E.getSizeOf(thisIsInMemory) // this is a string
    =84
    
  • @AkosLukacs, very well done! slice indeed seems to be the problem here: it seems to copy characters rather than doing some pointer arithmetics only (which I would have assumed as JavaScript strings are immutable)

    Thanks for your info!

  • Here is an example that creates a 1000 long file, read-s it, and free memory doesn't decrease by a lot (CPU is bare nRF52832, same as in the Bangle)

    So you didn't use Bangle when trying this? There may be another issue that Bangle uses external SPI flash for storage (?) which is not directly mapped to cpu address space so data needs to be loaded to ram at some point.

  • Good point, on the Bangle it does use more memory:

    >st.getFree()
    =937560
    /// more free space
    >E.getSizeOf(readback)
    =64
    /// yes, `readback` uses more memory
    
    >var ab = st.readArrayBuffer("a")
    =new Uint8Array([72, 101, 108, 108, 111,  ... 255, 255, 255, 255, 255]).buffer
    >E.getSizeOf(ab)
    =65
    
  • Yes, it's because Bangle.js can't memory map the file.

    The plan has always been to modify Espruino to add a new String type that can read direct from external flash, so hopefully that'll fix the issue when it's done - as well as freeing up more RAM by allowing seldom-used code to reside directly in flash.

    But I guess Storage.read(file-name, offset, length) would be a handy addition anyway - I've just added an issue for it: https://github.com/espruino/Espruino/iss­ues/1744

  • ...with this I will get my .seek() I was posting for a bit ago... (did not check for append though, yet).

    Having said so and working thru my storage options across all boards including bangle.js and after taking latter apart to look into modification options I could appreciate Storage to have a .connect() as devices usually have (or mount) to support more than just one storage (even replacing SD card). As 'required' parms (for non-standard/non-default storage) I could see offset and length and an optional parm for `rewritable. Having offset in addition to length allows to 'partition' the storage to alleviate (file) name search. Partitioning is also simpler than file/directory hierarchy and that could compensate for file/directory hierarchy flexibility.

    Did not take a look into the (file) name look-up, but I assume there is no caching for names since memory is in short supply anyway (and no disconnect / unmount needed).

  • @AkosLukacs

    I get

    Execution Interrupted
    New interpreter error: LOW_MEMORY,MEMORY
    

    right after reading the large file

    const Storage = require('Storage');
    let ImageData  = Storage.read('Lookup');
    

    even without any other statements following.

    Thus, the docs do not seem to be correct - it's not just a pointer to a memory region in flash which is returned.

    Just as a remark: my lookup data is > 100kb long. I was able to successfully transfer it from PC to Bangle.js - but there it sits and cannot be used, not even in parts.

  • Yes, that was confirmed by Gordon and me later.
    But there should be an option to seek into a file: https://github.com/espruino/Espruino/com­mit/24dac197a5fe83d5429251820133db9e62b0­1cf8

    I think the docs should be updated for the Bangle.

  • Well,

    I will then wait for the updated docs - because, with the current situation, I can't even determine the size of large files...

  • On latest builds there's now the ability to do Storage.read on part of a file.

    The docs aren't updated yet - as I mentioned above the plan is to add the ability to read the file direct from flash at some point soon

  • Thanks a lot!

    So, there will no longer be the need to slice long binary files into shorter ones? Because accessing a certain part within such a file is performant enough?

  • Yes, that's correct :)

    And when the 'direct from flash' mode goes in you'll actually just be able to draw the image directly from flash too

  • Ok, you got me...

    When will we get that feature?

  • No idea - next few weeks I hope

  • Turns out it's today. Builds from http://www.espruino.com/binaries/travis/­master now have it in.

    There are now 'Flash Strings' - they're special strings created only by Storage.read but basically they point to external flash and allow you to use data directly out of flash memory without loading it into RAM.

    That means you can use drawImage with files stored directly in flash, but it also means that any application loaded from flash storage will have all its function code stored in and executed from external flash - which should help no end.

  • @Gordon,

    does

    but it also means that any ... stored in and executed from external flash

    mean that it is an either-or ?

    Since this creates dependencies between various things and mostly between the way code is done and built static and uploaded or dynamic at runtime, it is important have the setups and coding documented to not create confusion.

    I hope my paranoia of experiencing unpredictable / unexplainable effects as a plain js user will evaporate like fog in the sun the moment I understand better that constraint.

  • Not sure I understand. Basically this is exactly what happens on all other Espruino devices. It's just that in Bangle.js the flash isn't memory-mapped, so some messing around has been required to 'fake' that.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

reading parts of a large data file

Posted by Avatar for Andreas_Rozek @Andreas_Rozek

Actions