-
• #2
Well, the string in
fb
is actually just a special variable type that references external flash memory. The flash memory isn't memory-mapped by the hardware so it's magically all handled in software.However
E.toString
should fix that by converting it to a big flat string - assuming there's enough RAM.To try and debug this, I'd:
- output the value returned by
E.getAddressOf
- maybe use
trace(fb)
and so on to see what the variable really is
But as far as I can see what you have there looks good
- output the value returned by
-
• #3
Thanks, Gordon.
I did check the address returned byE.getAddressOf
, it seems that, as you write,E.toString
moves it into RAM, but it still does not seem contiguous (by examining it viapeek8
).
At any rate, I just tried to speed up the line counting, but I will stick to doing it in JS since I do not want to incur the cost of copying the whole file to RAM.
Thanks again for your quick reply. The Espruino ecosystem is an impressive piece of software!
Marko -
• #4
I can't help you with the inline C but if your string is null terminated you do not need the len variable at all. Just use
while (*data)
and the if statement could be avoided by writing
s = s + (*(data++)=='\n');
This exploits that the boolean expression returns 0 or 1.
Whether this is faster is to be benchmarked as it will depend on compiler and how the code is actually compiled. -
• #5
Thanks. The C indeed wasn't optimized, but since I don't want to copy the whole file into RAM I probably won't pursue this path going forward anyways...
-
• #6
@user113695 it is contiguous. I tested it myself. I think your mistake is coming from this part:
var nlines = c.sum(fb.length, E.getAddressOf(E.toString(fb),true));
Try this instead:
var flatstring = E.toString(fb); var nlines = c.sum(fb.length, E.getAddressOf(flatstring,true));
Not exactly sure why your version doesn't work, will look into it.
-
• #7
I think its related to how the c.sum line is converted into E.nativeCall , and the way the arguments are processed, they are expecting resolved values, so the arguments are not eval'ed or anything. Thats all.
-
• #8
@d3nd3-o0, it all depends what resolved means... For Espruino JS, the arguments are 'resolved enough' - so to speak - because any code (JS interpreter implementation) - knows to handle a String object: 'talk to it' thru the its method, since it is object-oriented and one of the main goals i to hide the implementation. If the C code does the same as, for example the String method implementations of String.charAt(pos) and String.indexOf(needleString,[startPos]) and (under the cover/indirectly E.toString(aString)) do, it would be just fine. After all, the method/property accessor String.length can do it without reading the String into memory where C code similar as suggested could do the job. An indication that String in JS are obviously looked at differently - have a different implementation - than in standard C - otherwise the length would not have to be passed: a reference to an object in an object-oriented context is all it needs to handle the job.
From what I get out of this 'exercise' is that an implementation for counting line feeds in JS may be almost as fast as doing it in C because the meat is not really in parsing the string but actually accessing the. bytes of the string. (@Gorden) I hope I can assume that something along the lines of
var str = "xoxo" // as: = require("Storage").read("FileName"); , cnt = 0, start=-1, pos; // count occurrences of "x" while ((pos = str.indexOf("x",start+1)) > start) { cnt++; start = pos; }
does / could (?) work without String.indexOf() having to read the whole string in memory. On the other hand I know that streaming implementations are demanding and it may well be that it is not worth - or possible - to spend so much code and temp memory for it since the majority of string that are processed aren't that long that they would need a 'streaming' treatment.
The statement 'almost equally as fast as C' ignores the parsing and interpreting of the JS source which Is though significant and the efficiency depends on how String.indexOf() with start pos is implemented for a String 'living' in the storage.
-
• #9
If you don't want to load the string into RAM then you're pretty limited in what you can do with inline code, because you're reliant on Espruino to do the loading for you.
@allObjects suggestion of
str.indexOf
is great and I think would be a pretty good way to count newlines. A marginally faster version might be:var n=0,i=0;while(i=s.indexOf("\n",i)+1)n++;
Another option is to use regex:
var s = "hello\nthere\nworld\nlots\nof\nnewlines"; s.match(/\n/g).length
That'll be fast however it's going to allocate memory for each newline, so I wouldn't recommend if if you've got a file with more than a few hundred newlines.
-
• #10
In theory you could avoid some of the memory overhead of the array returned by
match()
by instead usingreplace
to generate a new string that contains only the newlines:s.replace(/[^\n]/g, "").length
Unfortunately, there seems to be a bug in replace() that prevents this from working perfectly.
-
• #11
@Gordon, nice play with the +1 on the -1 when not found for ending the loop.
Since there is obviously a need for operating on potentially 'large' strings in storage, I could see a new method, like
require("Storage").do(fn,fnct,optStartPos,optEndPosOrLength)
, where function can be JS function - plain or compiled - or inline C function. The function accepts byte and - when second argument provided -the byte's position. Function is called one more time with undefined when on 'end of file' with byte position equal to file/string length. The function is also called just once when out of bound with undefined and -1 as byte position. The function returning something else than undefined breaks 'the loop' and.do()
returns that value..do()
is kind of modeled after forEach / map / reduce / filter. For something similar to reduced, the 'accumulator' is though still missing in the function's argument list as is the decision whether.do()
's optEndPos or optLength should be used..do(...)
needs some more thoughts... ;-) -
• #12
By the way, those regex bugs are now fixed (in cutting edge builds, of 2v07 when released)
I could see a new method, like
require("Storage").do
Because of the way Espruino works with Storage, data in storage is actually memory-mapped. If you do a read or readArrayBuffer then you're actually accessing the data directly from flash.
As a result, you can actually do what you want right now using
forEach/map/reduce/etc
without any extra functions:var s = require("Storage"); // for every element new Uint8Array(s.readArrayBuffer("z")).forEach(print) // count newlines new Uint8Array(s.readArrayBuffer("z")).reduce((a,b)=>a+(b==10),0)
If you want to iterate over just part of the data, you can use the
Uint8Array
constructor to choose an offset and length :) -
• #13
...with s the Storage module and "z" the file in the storage... excellent.
-
• #14
Yes - sorry, forgot to add that...
In a piece of code I am working on I am trying to count the number of newline characters '\n' (i.e. the number of lines) in a string read from storage via something like the following:
Unfortunately my understanding of how the string returned from the require("Storage").read() call is stored in memory is clearly flawed since I am not at all getting the results I am expecting.
I am reading in a text file that I uploaded to storage via the web ide; the string is read in correctly, as confirmed by outputting it via console.log or using any of the other string functions on it. However, the internal representation of the chunk of memory is clearly not simple ASCII.
Or is the memory returned by read() simply not contiguous?
Thanks for any advise,
Marko