-
ok, as promised some sample code to reproduce the second problem when throwing inside a switch statement.
without switch(), everything behaves as expected:
process.on('uncaughtException',(err)=>{
console.log('!!!UNCAUGHTEXCEPTION!!!',err);
});
/*
expect output === achieved output:
-D0-
-D1-
CATCHED ... "simulate crash"
-D8-
-D9-
*/
function test( crit){
try{
const arr= new Uint32Array([1,2,3,4]);
console.log('-D1-');
if( crit) throw new Error('simulate crash'); // such as thrown by testing Uint32Array.xyz
console.log('-D2-');
}
catch(err){
console.log('CATCHED',err);
}
console.log('-D8-');
}
console.log('-D0-');
test('crash');
console.log('-D9-');
adding the switch() brings up some fancy extras:
process.on('uncaughtException',(err)=>{
console.log('!!!UNCAUGHTEXCEPTION!!!',err);
});
function test( crit){
try{
const arr= new Uint32Array([1,2,3,4]);
console.log('-C1-');
switch (crit) {
case 'ok':
console.log('-C2-C3-');
break;
default:
console.log('-C2-');
if( crit) throw new Error('simulate crash'); // such as thrown by testing Uint32Array.xyz
console.log('-C3-');
break;
}
console.log('-C4-');
}
catch(err){
console.log('CATCHED',err);
}
console.log('-C8-');
}
console.log('-C0-');
test('crash');
console.log('-C9-');
expected output (tested on chrome):
-C0-
-C1-
-C2-
CATCHED Error: simulate crash
-C8-
-C9-
achieved output:
-C0-
-C1-
-C2-
-C4- * should not exist
-C8-
!!!UNCAUGHTEXCEPTION!!! ... "simulate crash" * should be CATCHED
-C9-
-
Some more suspicious things...
This behaves as expected:
process.on('uncaughtException',(err)=>{
console.log('!!!UNCAUGHTEXCEPTION!!!',err);
});
const arr= new Uint32Array([1,2,3,4]);
console.log('-A1-');
if (!!arr.xyz) console.log('-A2-');
console.log('-A3-');
But this
process.on('uncaughtException',(err)=>{
console.log('!!!UNCAUGHTEXCEPTION!!!',err);
});
const arr= new Uint32Array([1,2,3,4]);
console.log('-B1-');
try{
if (!!arr.xyz) console.log('-B2-');
console.log('-B3-');
}
catch(e){
console.log('-B4-',e);
}
console.log('-B5-');
reports some strange error (see 2nd UNCAUGHTEXCEPTION message):
-B1-
!!!UNCAUGHTEXCEPTION!!! Error {
"msg": "Field or method \"xyz\" does not already exist, and can't create it on Uint32Array",
"type": "Error",
"stack": " at line 2 col 12\n if (!!arr.xyz) console.log('-B2-');\n ^\n"
}
!!!UNCAUGHTEXCEPTION!!! SyntaxError {
"msg": "Got catch expected EOF",
"type": "SyntaxError",
"stack": " at line 1 col 1\ncatch(e){\n^\n"
}
-B5-
Maybe another topic relates to this: i think that exceptions got lost without any notice when they occured inside a switch statement. i try to come up with some test code for this, too...
-
just a "picture" without words;)
process.on('uncaughtException', function(err) {
console.log('!!!UNCAUGHTEXCEPTION!!!',err);
});
const arr= new Uint32Array([1,2,3,4]); // or any other ArrayView
console.log('test E...');
if (!E.xyz) console.log('E ok'); // works as expected
console.log('test arr...');
if (!arr.xyz) console.log('arr ok'); // throws exception - not as expected
console.log('done all ok');
EspruinoWIFI 1v96
-
- Take 4bpp (?) data a chunk at a time and unpack it into another buffer with 16 bits.
- Kick off DMA from that buffer
- Take another 4bpp (?) data chunk and unpack it into another buffer with 16 bits.
- Wait for DMA to finish
- Goto 2
As already mentioned - to do a full DMA setup per 16bit is much too slow (because the necessary procedure for setting up, starting and then stopping dma/spi). so i tried it with double buffer (DMA) feature. basically this works nice, but it has some drawbacks:
- with 16bit payload at top speed of 12.5Mbaud we have to feed new data every 1,28us to the DMA buffers; on the EspruinoWIFI (100MHz, 1-2 cycles per instruction) this counts as ~85 instructions. well, it seems that some IRQ out there (timer?) blocks my feeder for longer than that.
- increasing the buffer by 10x (e.g. 10x16 bit) works perfect (6x16 does not), but... in this case we have to waste a lot of (blocking) cpu cycles while waiting for a buffer to get free for next data.
- in fact - a fast palette lookup (i am not talking about E.mapInPlace) - saves a lot of CPU load 'cause it's much faster than the SPI. it's not my intention to waste then these savings with blocking waits for background DMA ;)
the best way of pressing paletted image data seems to me:
- unpalette graphics data (typ. 1/2/4bpp into 16bpp) chunk by chunk with a really fast lookup method
- each chunk as huge as possible (typ. 2..20 kbyte)
- this allows parallel processing of JS code while the last chunk goes over the line
some simple measurements of optimized asm 1/2/4 into 16bit lookup functions show that unpaletting is typ. 3..10 times faster than the net SPI transmission time. e.g. on 10k pixels @1bpp we bring >11ms (12.80-1.44) of cpu time back to JS compared to any blocking method.
test_map1to16 1000 pxls 0.43 1.28 2.9x
test_map1to16 10000 pxls 1.44 12.80 8.9x
test_map2to16 1000 pxls 0.44 1.28 2.9x
test_map2to16 10000 pxls 1.56 12.80 8.2x
test_map4to16 1000 pxls 0.48 1.28 2.7x
test_map4to16 10000 pxls 1.87 12.80 6.8x
- Take 4bpp (?) data a chunk at a time and unpack it into another buffer with 16 bits.
-
It's tested for the EspruinoWIFI based on STM32F411. Adaption to another board should be quite easy, as long as it is a ARM processor. I think changing the xxx_BASE constants should be all do be done.
Pleae consider, that you will prefer the double buffer mode (DBM) for an audio codec. This feature is not used in the current lib. -
inlining the peek/poke
there seems to sit a little bug - this does not work:
const a = [<address>,<value>];
poke32( a[0], a[1]);
this is fine:
const a = [<address>,<value>];
const x= a[0];
const y= a[1];
poke32( x,y);
this is fine, too:
const a = [<address>,<value>];
poke32( a[0]|0,a[1]|0);
another - similar? - flaw i have seen:
const SPI_CR2_TXDMAEN = 0x0002;
function rset16( addr, mask){
"compiled";
console.log( addr.toString(16), mask.toString(16)); // debug output
poke16( addr, peek16( addr) | mask);
}
function foo( qctl, buf_ptr, byte_cnt) {
"compiled";
const SPI_CR1= ...;
...
rset16( SPI_CR1+4, SPI_CR2_TXDMAEN);
}
generates this output: 40013004 [object Object]
expected output: 40013004 2
to achieve the expected behaviour, i have either to
- remove "compiled" directive from foo, or
- do rset16( SPI_CR1+4, SPI_CR2_TXDMAEN|0);
-
-
I just tested, and you can turn on Modules uploaded a functions (BETA) in Communications under Settings.
great - but with this option ON, "compiled" produces an error (see attachement). but don't worry, i can live without (with some less comfort). and the new lightning fast peek/poke compilation already helps pretty much.
-
-
i implemented something similar, but with larger chunks (1..10kBytes); sending just 16bits with DMA is very slow. it might be even better to write directly to SPI TX? maybe DMA double buffer (DBM) helps, i did not try till now. but i am not sure if 16bits are enough get rid of the inter-byte gap (due to CPU load for DMA ready scanning). think i will give it a try.
i have identified another performance brake: E.mapInPlace; i think it's use of JSVars slows down the lookup. using asm coded specialiced functions for 1/2/4bpp is about 20x faster.
-
wow² ;) you are really speedy!
you are right - i was missing the half word operations
E.asm seems to be available in the editor window only. when used in modules, it does not work. is there an easy way to have E.asm for modules, too?
my SPI DMA driver has a very strange issue open (marked as i#2). it applies only to the writeInterlaced( buf, N) call - e.g. when repeating the buffer. in this case, the display shows just random data for the last chunk sent. when adding a dummy chunk (just 1 pixel) at the end, it works fine (but for the price, that the function has to wait until everything sent). behaviour occurs independent of SPIx, byte count, baudrate. the only thing i saw was that writing to a certain JSVar while having the DMA running in background seems to change to DMA data. but when checking the DMA it pointed definitiely not to the JSVar... very strange... in fact, i could not figure out whats the real reason. do you use the DMA in the firmware? do you have any experience with DMA in FIFO mode + repeating source data ?
-
I published the current status of the SPI DMA driver at https://github.com/andiy/espruino.git
It's important to have in mind, that DMA is only of advantage when sending a minimum amount of data. Below are some benchmarks to have an idea when DMA may be of advantage.
Times [ms] for sending a data buffer of length N:
bytes | native write | writeInterlaced* | writeInterlaced$*
----------------------------------------------------------
20k | 42.0 | 4.2 / 18.4 | 3.0 / 18.0
10k | 22.2 | 4.2 / 11.8 | 3.0 / 11.4
1k | 4.1 | 4.2 / 9.9 | 3.0 / 8.7
0.5k | 3.0 | 4.2 / 9.9 | 3.0 / 8.7
*) 1st number is the net time for calling writeInterlaced, 2nd number is the total transmission time
CONCLUSIO:
- writeInterlaced() overhauls native write() at 1k+ bytes
- writeInterlaced$() overhauls native write() at 500+ bytes
When sending a small buffer of 1, 2 or 4 byte multiple times, the results are as below.
QSPIx.writeInterlaced( buf, N); // is compared with
SPIx.write({data:..,count:N});
CONCLUSIO:
- 1 byte buffer: advantage for writeInterlaced at N > 1800
- 2 byte buffer: advantage for writeInterlaced at N > 3000
- 4 byte buffer: advantage for writeInterlaced at N > 3500
-
i am using the E.asm(...). it's a great tool to get things done (even when some thumb instructions are missing;).
compilation is fine, but does not bring that boost as E.asm does - even when just using peek and poke. I implemented the same operation in 3 different ways, and called each 1000 times:
// takes 538ms ... 222% of the fastest
function rclr1(addr,mask){
poke32( addr, peek32(addr) & ~mask);
}
// takes 334ms ... 138% of the fastest
function rclr2(addr,mask){
"compiled";
poke32( addr, peek32(addr) & !mask); // ~ not supported by compiler
}
// takes 242ms
var rclr3=E.asm("void(int,int)",
"ldr r2,[r0]",
"bic r2,r1",
"str r2,[r0]",
"bx lr"
);
i took a look on the compiler output - the differnces in code between rclr2 and rclr3 speak for themselves:
var rclr2=E.nativeCall(1, "JsVar(JsVar,JsVar)", atob("LenwT4ewBq1F+AgNB0Y0SCpLeESJRphHKU6CRrBHASMAkwAjGkYZRgGVJkyDRqBHgEYlTFhGoEdQRqBHQEawRyJLgkZIRphHgPABACBLwLKYRyBLAUaDRiYiUEaYRx5KA5CQRx1KkEeBRgObGEagR1hGoEdQRqBHQEagRxlIBJcOS834FJB4RJhHB0awRwIjAJMAIxpGGUYBlQpNBkaoRwVGMEagRzhGoEdIRqBHKEagRwAgB7C96PCPAL+1hgMA0aYCAPFQAgDldAUAIawCAHV2BQCLKwIAiZgCAIV2BQDSAAAAZQAAAHBlZWszMgBwb2tlMzIAAAA="));
var rclr3=E.nativeCall(1, "void(int,int)", atob("AmiKQwJgcEc="))
ps: seems that the unary '~' operator has been forgotten in the compiler
-
yes, i had it all written in JS, but changed then to asm because of the high DMA setup costs in JS. In JS it took approx. 7ms, which results at eff. 5MBaud in 4375 byte. or in other words - transmissions of less than 4,4kbyte would be LESS efficient using DMA over the native SPI implementation. with asm i could reduce the time by approx. 80%, so DMA makes sense for any packet >500byte.
regarding the ILI93141 module: on the standard module only the fillrect benefits from DMA, for the ILI93141pal things are a bit better. but at the end i decided to replace both with my own ILI9341 driver, adding some pretty features such as smoothed fonts (incl. the font generator necessary to build them from any google font). if you like, i can provide it to you in the next days.
-
many thx - nice hack to get a flat Uint8Array ;)
i packed it into some handy functions extending your E instance:
// grant flat arraybuffer http://forum.espruino.com/conversations/316409/#comment14077573
if (E.newArrayBuffer===undefined) E.newArrayBuffer= function(bytes){
const mem= E.toString({data:0,count:bytes}); // undefined -> failed to alloc the *flat* string
if (!mem) throw Error('alloc flat for '+bytes+' bytes FAILED!');
return E.toArrayBuffer(mem);
};
if (E.newUint8Array===undefined) E.newUint8Array= function(cnt){
return new Uint8Array( E.newArrayBuffer(cnt));
};
if (E.newUint16Array===undefined) E.newUint16Array= function(cnt){
return new Uint16Array( E.newArrayBuffer(cnt*2));
};
if (E.newUint32Array===undefined) E.newUint32Array= function(cnt){
return new Uint32Array( E.newArrayBuffer(cnt*4));
};
at the moment, my DMA extension for SPI supports TX only - but gives a really fine performance improvement when running at 12.5 MBaud.
-
flat arrays are a mandatory thing when working with DMA.
but, what is the recommended way to create a flat array for sure?this does not work for n<23, and does not reliably work and n>=23.
const sometimes_flat= new Uint8Array(n); // does not generate *flat* arr. for n<23
this generates always a flat variable, but it's a string and not a arraybuffer as needed.
furthermore i can not create a empty buffer just specifying the length.const always_flat_or_undefined= E.toString(1,2,3,4); // creates *flat* var even for <23 bytes
good to know, that these two behave like new UintXArray (and not like E.toString)
const sometimes_flat= E.toUint8Array(...data)
const sometimes_flat= E.toArrayBuffer(...data)
in fact, this does not work either:
const flat_str= E.toString(1,2,3,4);
const not_flat_arr= E.toUint8Array( flat_str);
my workaround for the moment: create always a UintXArray >=23 byte
but this wether very elegant, nor guaranteed to work with future firmware versions.are there any recommendations how to create a flat UintXArray, independent of it's length?
thx!
-
at least some hint in the API doc (http://www.espruino.com/Reference#l__global_arguments) would be fine.
-
-
my test snippet:
function abc(p1){
console.log(arguments.length);
}
these tests are fine:
abc(4); // prints "1" .. ok
abc(4,5,8); // prints "3" .. ok
and here is the flaw:
abc(); // prints "1" .. NOK - "0" expected (checked on browsers and node.js)
as already mentioned - it's not critical, but some libraries written for node may fail in their arguments check.
-
i am using 5MHz; when setting to anything higher it still works, but does not work faster. seems that the espruino limits the SPI to 5MHZ?
and yes - i am using the connection plan from the official example http://www.espruino.com/ILI9341
-
-
hi,
the ILI9341 needs some performance tweak to allow fast visible feedback after user interaction (button press).
imho the paletted driver is no solution, due to it's very slow .flip(). even when combined with .getModified() there remains the trade off between high memory consumption and low color depth. and mostly it is not necessary to keep the display data all the time in memory.running just a partial display update, with data generated by Graphics.createArrayBuffer() does the job quite fine.
unfortunately the spi.write() blocks execution until everything sent. this prevents interlacing of drawing and spi output - which would bring some noteable performance boost.is there any possibility to make the spi.write async?
either with a callback (which allows the app to do some buffer/job managerment), or at least with a simple option "please dont block" (so that the app hands over the buffer and forgets it)?
@Gordon: what's the recommended alternative for espruino wifi (when BT is not an option) at the moment? thx!