• Writing and Reading 'persistent' data using FlashEEPROM (journaling) module - Part 1 - Basics

    Writing data to and reading data using FlashEEPROM module and Flash Library/Class for storing data and reading it back does already an excellent job to hide the hardware API... nevertheless, for the application space: mmmh - we will have to 'talk' about.

    Users would expect the following code just to work and show 513 in the console:

    // Store the (integer) number 513 at flash address 1 (2nd page of available pages and read it back).
    var flash = new (require("FlashEEPROM"))();
    flash.write(1, 513);
    var n = flash.read(1);
    console.log(n);
    

    The module after all 'implies' - kind of - this simplicity ...at least to users who usually do not (need to) look under the hood or are somewhat 'HW remote'. After all, for all our gadgets from tiny to jumbo we have 'mechanics' that take care of them, and if the sorrow gadget is so very highly integrated as electronics is or as fast assembled as snaps, we just get the next and also newer and more capable version... (do not want to touch the ecology subject here...)

    The simple, straight forward code executes without complaint and reads back the value 1 ...as first and only element of a Uint8Array:

    new Uint8Array([1])
    

    How come?

    Of course, there is absolutely nothing wrong with the module: it works as designed (AND implemented). To use it as simply as intended, some more abstraction is required in the direction of the intended usage.

    Looking at the code, the 'issue' is that 513 is an integer that requires two bytes to fit / to be represent by. .write() writes bytes - one for every given value - and thus from any integer number (or any other typed value) *.write() writes only the least significant byte... (number % 256 (modulo) or number & 255 and number & 0xFF (bit masking by bitwise AND), which for 513 (= 2 * 256 + 1) is 1, exactly what we got back so far).

    With 'a few small' changes, the user - me - get's it 'wright' (at least working):

    // Store the (integer) number 513 at flash address 1 (2nd page of available pages and read it back).
    var flash = new (require("FlashEEPROM"))();
    flash.write(1, [513 >> 8, 513 & 255]);
    var r = flash.read(1);
    var n = (r[0] << 8) + r[1];
    console.log(n);
    console.log(r);
    

    Line 7 has been added to show what we read back and from what we have to reconstruct the expected value 513, which is 2 * 256 + 1:

    513
    new Uint8Array([2, 1])
    

    Some how this 'business' of bit (byte) banging should be hidden. Could we not add it to the module/class, since it is already doing a great job in 'understanding' my intentions: For example I did not have to specify how much to read, the (journaling) module/class some how knew how much I wrote the last time and fills that in for me (lazy coder) for number of bytes to read - the second argument - in .read()... (nothing against efficient laziness, but to much of it in code become 'crypticness',... or ambiguities at least).

    The approach of adding a particular abstraction 'to fix an issue' into an existing module could limit the general (re-) use of the module and put it into a niche; or worse: it could lead over time to creation of lots of similar modules that - duplicate code for the core of write and read, which - at worst - result for a larger application using multiple abstractions in duplicate code and unnecessary foot print. The really worst is though yet to come: the implementation of the logical duplicate code pieces could be out of sync since coming from different modules - and lead to compromising results.

    Different applications have needs for different abstractions or sets of abstractions, and therefore, the onion principle of layering software can save you 'many tears', even though the real onions tend to do the opposite...

    Layering means to put a layer around the existing, re-usable code and put the additional, particular functionality into that new layer. Lets take a look what that new layer has to do:

    Flash library reference - espruino.com/Reference#Flash - through which the FlashEEPROM module's function is exposed to programming - says that the data argument - the second argument - of *.write() accepts some kind of string or array of 1 to n byte(s). In practice, this can be a String - which is a string of bytes and kind of an array of bytes - or an Uint8Array - which is also a 'string' of bytes, just a bit differently stored internally. It also accepts any single value or common array of any values. From experience we though know that any single value or any value in the array that 'is more that just one byte', .write() takes only the least significant byte and writes just that byte. For the number 513 - which is (hexadecimal) 0h0201, binary 0b0000001000000001 (6 leading 0 bits, followed by a 1 and then 0 bit for the first byte, and 7 leading 0 bits followed by a 1 bit for the second byte) - it means that just the value 1 is written. Therefore, we have to make two (2) bytes out of it, first the most significant one, and then the least significant one.

    To get thew most significant byte, we use the bit_shift_right operation (>>) by 8 bits 513 >> 8 - which pushes out the 8 bits of to 2nd byte and pulls in 0 bits from the left and gives a the - 1 byte fitting - integer number 2 - binary 0b00000010.

    To get the second byte, we could just use the 513 number as is, because .write() ignores anyway any more significant bytes than the least one. But the 'sake of clarity', we use the bit-wise AND operation 513 & 255 or 513 & 0hff or 513 & 0b11111111or (to make it really really obvious) 513 & 0b0000000011111111, which masks anything but (all) the (8) bits of the least significant byte.

    We stick both bytes - also called most and least significant bytes - into an array and pass that array to .write().

    After reading, we have - of course - to do the 'reverse': we shift the byte we read first (r[0]) 8 bits to the left (with shift-left operator (<<) to get the most significant byte back (as a number) and add just the second byte as is. Again, the nicety of JavaScript does its part and makes out of the the first read Uint8, which is just 1 byte or 8 bits AND unsigned, a signed number of (at least) two bytes, and adds the second Uint8 - unsigned integer of 8 bits - as number properly to it.

    *Note: arithmetic operators have precedence over shift-right (>>)) and shift-left (<<) and therefore the shift operations have to be given precedence by surrounding them with parenthesis (where needed). With 513 and no parenthesis, the typical binary-computing error of + or - 1 shows for read back: 512. ;-)

    What if we have integer that do not fit into to bytes anymore, such as values greater than 65535 and less than -65536? Espruino can store integer values from 2^31-1 down to -2^31 - 4 bytes, 32 bits.

    *Note: Watch out, that any intermediary result or term within an arithmetic expression or Math function that is or uses floating point will result in a floating point, even if mathematics 'suggests' an integer result and the result is within range of the max to min integer.

    An abstraction handling just one non-negative integer value up to 32767 - the most simple thing next to a single integer value up to 255 or character - in one write is not good enough. Multiple values of any mix and match types of string, int, float, boolean and even Date have to be handled with on write and read operation. And what about handling of null and undefined?...

    Two common abstraction options are:

    • CSV - write/read a CSV (comma separated values) string like:

        513,ABC,2,-513
      
    • JSON - write/read a JSON stringified object, a string like:

        {"pInt":513,"s":"ABC","i":2,"nInt":-513}
      

    Immediately one notice the storage footprint difference 14 vs. 41 bytes (I love this number coincidence), but also the difference in freedom, constraints AND consequences that go with either option. For example, even with providing the column names for CSV for that single one record of data in a first line or row (in file or a string that contains multiple such 'records'), CSV outperforms footprint wise JSON by 13 bytes (28 vs 41 bytes), and with each additional record, it gets better for CSV and worse for JSON:

          pInt,s,i,nInt[lf]
          513,ABC,2,-513
    

    Note: mileage varies with length of column names and some other things... you may claim... rightly so for one line... ;-)

    Multiple records:

          pInt,s,i,nInt[lf]
          513,ABC,2,-513[lf]
          6,xyzAndMore,7023,-1023
    

    respective

          [{"pInt":513,"s":"ABC","i":2,"nInt":-513}[lf]
          ,{"pInt":6,"s":"xyzAndMore","i":7023,"nInt":-1023}[lf]
          ]
    

    Parsing get also a bit different... but for handling anything beyond a simple list of values, for example nested lists of values - where 'one value' is actually a list of values - arrays or objects - JSON's might cannot be beaten.

    A third option is XML, which is even more verbose / has a tremendously larger foot print - but on the other hand enables a plethora of very good things for robust data exchange between disparate systems and applications, such as validation against a schema for (coditional and non-conditional) structures AND values, name spaces, character sets, etc.,... Since in recent times JSON became THE quasi de-facto standard pushing xml a bit a side - even in non-JavaScript environments - JSON is used all over the place. Even though JSON is not as robust and flexible in (global) interpretation of conveyed data as XML is, it is a good middle ground.

    CSV is about equally useful and used, especially when data has - for what ever reason - table-like format, like data that is exchanged with spread-sheets and relational (SQL) databases, and last but not least legibility for people with little exposure to - and need for - formal languages. With CSV data (as many lines in a file as rows in the spread sheet or data base table), one has to know what each columns means. Therefore, CSV writers and readers have the option to exchange the meaning in the first line (row before the first data row): instead of the values, the names of the values are passed. As noticed in comparison with JSON, CSV data elements have no explicit type notion what so ever: ...,123,... can be interpreted as number 123 OR as string "123". Since columns are usually of the same type, name/position/column number information is sufficient for proper interpretation, and the penalty for absent values is practically negligible: just a comma,... and strings save too (even if it is just 2 bytes). The only contention to look out for is when string values contain comma(s). Such commas have to be escaped in order to allow proper parsing on the receiving end, which is splitting by comma (,) to get the individual (column) values.

    For both CSV and JSON options, writing requires - first - a (payload) string to be composed, then - 2nd - prepended with the payload's length by two bytes (Uint8s) - similar to the processing of the number 513 - and then - 3rd and lastly - writing of the whole, new string is to be written.

    For reading back, first, two bytes are read of which - 2nd - the length of the payload is calculated - exactly as reconstructing the number 513 from Uint8Array - to - 3rd - read the whole string and - 4th and lastly - parsing is applied to reconstruct the values.

    Unique to the options is just the composing of the payload string for writing and the parsing after reading.

    CSV is RYO - may be long after Rio 2016, RYO may become an olympic discipline - though for now it just means: roll your own (module), where JSON is easy-peasy. JavaScript (language) has built JSON handling and it is a refreshing breeze. It is a breeze not just for the coding, but also for the execution: from a performance and storm' point of view, JavaScript makes it THE Perfect Storm: it is fast because it is part of the heart of the JavaScript source code compiler / parser / interpreter, and those are by definition performance tuned to the utmost detail.

    Code examples demonstrating both CSV and JSON options follow in separate posts.

About

Avatar for allObjects @allObjects started