Regular Expressions!

Posted on
  • There are now finally regular expressions in Espruino! If you download a cutting edge build then you should get them.

    There's:

    • /regex/ syntax to define them
    • Or `new RegExp("regex")
    • regex.exec to match on a supplied string
    • string.replace(regex, replacement)
    • string.split(regex)

    It's not got all the RegEx functionality built in, but should cover most of the common use-cases.

    Let me know how it goes! The RegEx system might still need some work, so if you find a RegEx that doesn't seem to be working properly, please let me know!

  • That's excellent... (I often wonder where you still find the memory for it...).

  • I'm sure it'll make the builds for the original Espruino a little more painful, but it's worth it - and not actually that much more code (it's not a 100% full-featured regex, but should be enough for basic stuff)

  • Are functions as replacement supported? I would love to be able to use my RunlengthEncoding snippets.

    let RLE = {
        enc: t => t.replace(/([^\d])\1*/g, m => m.length > 1 ? m.length + m[0] : m[0]),
        dec: t => t.replace(/(\d+)([^\d])/g, (m, r, c) => new Array(+r + 1).join(c))
    }
    

    -- Edit --
    I did flash this version on my espruino wifi and it seems to work!
    But i run into LOW MEMORY issues while using RLE.dec with a ~5k chars long string.

  • edit: function replacements are now supported!

    I'm afraid you can't use functions for replacement at the moment - and also there aren't backreferences so your encoder wouldn't work. The code would work but would insert the stringified function - hence lack of memory :)

    I'll see about adding the functions - that shouldn't be too hard.

    Backreferences are possible, but a reasonable amount of work - I wonder how often they're used. While your RLE code is neat, it's pretty easy and possibly faster to implement in normal JS :)

  • Funny you say that.. i did just rewrite it in normal JS.. but it is not ready yet. Sadly i already minified it because it worked in the browser and with node js. But with espruino it seems to hang up somewhere.

  • But it works when unminified? If you find out where it behaves differently, please can you let me know?

  • I did not test it unminified. Let me try that later today and i will let you know if anything changes.

  • Just added the ability to use functions (only for regex though) - but you still don't get backreferences I'm afraid.

  • That was the thing I used in the past (replace with context logic).

    but you still don't get backreferences I'm afraid.

    Is this referring to p1, p2, ... function parms in MDN's doc about String replace w/ function doc?

  • Thanks @Gordon i will give it a try. I am in the process of deminifying my RLE implementation.
    The lack of backreferences is negligible. I guess its a more advanced feature anyway.

    In my case i only need the backreferences for encoding.. which is a pre step on the PC. I only need the decoding part on the espruino.

  • Okay here are the results in respect of RLE.

    First the two contenders:

    Unminified

    module.exports = {
        enc: string => {
            let map = (count, char) => (count > 1 ? count : '') + ('¾µÀÁÂÃÄÅÆÇ'[char] || char),
                counter = 0,
                last = '';
    
            return [].reduce.call(string, (result, char, index) => {
                if (last !== char) {
                    result += map(counter, last);
                    counter = 0;
                    last = char;
                }
    
                counter++;
    
                if (index >= string.length - 1) {
                    result += map(counter, last);
                }
    
                return result;
            }, '');
        },
        dec: string => {
            let map = char =>'0123456789'['¾µÀÁÂÃÄÅÆÇ'.indexOf(char­)] || char,
                count = '';
    
            return [].reduce.call(string, (result, char) => {
                if (!isNaN(parseInt(char))) {
                    count += char;
                    return result;
                } else {
                    result += count.length ? new Array(parseInt(count) + 1).join(map(char)) : map(char);
                    count = '';
                }
                return result;
            }, '');
        }
    };
    

    Minified

    module.exports=((a,b)=>({enc:t=>b.call(t­,(p,c,x)=>{return p.l!==c&&([p.r,p.c,p.l]=[p.r+p.f(p.c,p.l­),0,c]),p.c++,x>=t.length-1&&(p.r+=p.f(p­.c,p.l)),p},{l:'',r:'',c:0,f:(c,s)=>(c>1­?c:'')+(a[s]||s)}).r,dec:t=>b.call(t,(p,­c)=>{return[p.r,p.j]=[~~c==c?p.r:p.r+(p.­j?Array(+p.j+1).join(p.m(c)):p.m(c)),~~c­==c?p.j+c:''],p},{r:'',m:i=>'0123456789'­[a.indexOf(i)]||i,j:''}).r}))('¾µÀÁÂÃÄÅÆ­Ç',[].reduce)
    

    I minify by hand but i tested the code in the browser and nodejs and confirmed both versions work as expected. It is not a straight RLE implementation as you can see.. the purpose is to minify base64 strings, so there is some substitution going on.

    In regard to espruino the unminfied version is running fine. The minified version however behaves unexpected. The webIDE freezes.. reconnecting to the board does not work, even typing on the left side is not working. Restarting the webIDE has no effect. I have to reflash my espruino wifi to fix this.

    -- Edit --
    I tested the latest cutting edge build (1v94.130) with this

    let dec = t => t.replace(/(\d+)([^\d])/g, (m, r, c) => new Array(+r + 1).join(c)).replace(/[¾µÀÁÂÃÄÅÆÇ]/g, i => i.charCodeAt(0) % 10);
    

    I get the same behaviour as with my minified non regex version.

  • Do you have any example input/output? You said it needed base64?

    Also, there's actually a good compressor called heatshrink built into Espruino. It's not accessible from JS at the moment, but it might make a certain amount of sense?

  • If you try the latest build, it should work now:

    >let dec = t => t.replace(/(\d+)([^\d])/g, (m, r, c) => new Array(+r + 1).join(c)).replace(/[¾µÀÁÂÃÄÅÆÇ]/g, i => i.charCodeAt(0) % 10);
    =function (t) { ... }
    >dec("6a7b1c")
    ="aaaaaabbbbbbbc"
    

    It turns out it was because String.replace wasn't resuming the search from the correct place.

  • It would need base64 as input for the encoding and the result of that is input for the decoding.
    The encoding step basicly substitutes the numbers with some characters outside of the base64 alphabet and then does a runlength encoding. I never tried the encoding step in espruino though.

    To give a little background.. i recently got a eink display (waveshare 4.2 inch) and i wanted to display full size (400x300) images with espruino. The output of the common img2lcd tool was
    to big because of all the overhead of hex notation, commas, spaces and line breaks. The image
    information itself is just 15k bytes but in this representation its close to 80k bytes. So i base64 encoded the image which gives me 20k bytes which by the way runs just fine. But i saw major potential to further reduce the size with some RLE.

    Having any compression/decompression options build in would be nice and if the output of the compression could be an accepted string would be nice for assets.

  • For my test input of 5310 chars it still crashes.
    After quite a while i get this error message:

    Uncaught Error: Unknown escape character 117 in RegEx
     at line 1 col 184
    ...{return a.charCodeAt(0)%10})
                                  ^
    in function "dec" called from line 2 col 229
    ...)%10})};console.log(dec(img).length);­
                                  ^
    >
    

    Here is the full example code to reproduce:

    let img = '720/Ç66/D65/gf64/wB64/ÆAH63/+2Af63/g2A6­3/ÂÇAD63/PÄwP62/wPjg62/+Q/PD62/nDÆ+PÆ60/­Æ+PÃÂ/j60/l2/vzÂP59/Æj/+/PAf59/kv/ÅÇwB59­/Æw2/P¾AH59/lv/Æ/Bwf58/+C2/zÂfh59/wJ2/OF­/H58/+Uz/ÃzÁ+f58/yfn/m/fÂ59/A/v+Gd/j58/íD/fÂÃÁ/P58/kÁ2/G2fÆ58/+3/ÇÅ2/Ã58/z3/k3/n­58/N2/Æz2/+f57/ÆD2/h/Ç/Ã58/yf/Æj3/n58/J2­/n2/f+P57/Æ2/ÆP/µ/Â58/Ã2/Àf/f/j58/n/+M2/­f+P58/P/ÅÃÆL/Âf57/Æ2/OTÄH/h58/Ã/ÇcnÆ/+X5­8/zžÆPÅ/Ãf58/nzTÂfv/l57/A2Ph/x3/H56/wAC­Pk/jf/Æf56/P/AOT/l2/x56/ÄP/ÂM2/f2/H56/ef­/ÆD/Æ/f+f55/Å+f/jÁ/ÃÇ/Â56/v2/ÆÇP2ÁÅ/j55/­Æv2/BÄ/vPn+P55/wP/Ç2x2+/PÃ55/+b2/r/zÃ/+/­n55/Å2/ÇH/2nÁÇÄf55/v2/kf/O2fÅp55/+j/ÆÅ/+­Z+/ÀH55/ÄP/z2/ÆnÇ/IfÂf53/oH/f2/ÂPz+BÆAP5­2/+H/Ã3/ÂfnÂPn/P52/z2/v3/Â/OQÆgGf52/P/+4­/w+YDg/B52/Æj/Å4/xÆAMf/P52/ÄP/v4/wIAn/+f­52/v/Æ5/w2Af/Ã52/+2/r/ÂA2/gAH2/z52/ÃPÇvY­HÆf/gA3/P52/k+W+x2/D+AD56/PÆb/Â2/g2Af55/­Æ/zv/+f2/AB56/Ã+u5/+AP56/zÄÅ3/v/wA57/vlP­3/P+AD56/+fE3/+PwEf56/Æ+D3/+GAx57/ÃÆP4/A­HP57/zÃ5/ÃÆ24/Af32/nj4/+fÁ23/gAD32/nP2/v­/j/f22/Æf+D32/M/Á2+I25/n2/D31/+TÄÁÅÂP24/­Ç2/ÀH31/ÆPÂ2/j25/n/vbH31/Âfn/w23/f/O/+Av­P31/ÃÆ/ÆP22/ÂfgT2/Â+f31/zÁ+H23/sÆcP2/zÂ3­2/nfB23/+ÂJw3/vL32/IB24/zxnB2/ÆÇn32/g24/­fvÅZn2/ÁnP+H54/Â+PlnO/+Qc/yP54/ÁÆfGcz/ÂM­b2+f55/xÆZÃf/HÂHÅÃ56/HÃPk/Æ2/f/n55/Ævk/T­/n/2Ç+f55/m/HÆDÆ2/zÁÃ55/gfÆ/yfn2/PfnÇ31/­v21/ÂA/z/DÆ2/ÆÃ+Pj52/+BÂ/PÆfn2/zPÆeP52/2­jÃ+fxÆ3/J/Æf52/ÆfHzÃ/PÁ2/ÆP55/n+fÁjÂ+f2/­z/z2/Á50/Æ/Æ/OHnÃ2/+f/f2/f50/Á/z+dc/P2/h­/Ç/PÇ50/+f/vǾnÆ/+AP2/Ç8/Âf10/gf30/Ã/+fz­o/j+AF11/+EP7/+3A31/v/Ã/OH+PgBn11/n+P6/4­AZ30/Æ2/z+ÆfÂÆDÆ5/P5/Ã/+P5/ADg2Az30/z2/P­Ãx/jgPH4/Æ5/Æf/+f4/xP2/wBH30/P/+fzT+MBx5­/Á5/H2/Æ4/Æb3/hCf29/Ç2/Â/FnÂQcf/ÆA7/z/wD­Ç4/nf3/iL30/Á/ÁzÆGfhBn2/2AQB4/Æ/ÂPBz3/ÆÅ­3/+EP30/f+PjwIÆAE2/w/w2AR2/Æf+P/xn3/Pf3/­ÆY30/Â/Â/2AzgAn/+H/Â/gD/ÆP/x2/xP2/2Ã4/Âj­30/j3/ABMACf/h2/w4AH/+f2/wf/+OP4/iP29/+H­3/AQ2AZ/wP2/gH/gHÃ/n3/w2/Bz4/mI30/ÃP3/Bg­H3AD/f/P4/nÆ4/wGAÆf4/cD30/if2/ÆHAC3A/Ã/+­f4/fn4/ÂAMH5/ÂP29/+M3/wO2AOAD/v/Ã5/Æ6/gP­6/g30/Âd3/gcAQYAH3/Á5/n14/D30/gÃ2/+AQBAg­YP3/f4/Æ15/H29/+Rx2/Æ2AECIÂ3/Æ5/Á14/+BÂ2­8/Çjx2/Æ2AwAjz3/z4/+15/ÆD29/zDx2/+CGAGHP­3/v4/Á15/Â3/f26/OBzh2/HwA2c3/+f3/+f15/Â2­/Å26/Æ+2AR/2AEDwz3/Â4/Å16/Â29/Å+Hgw3AGHD­n3/j4/f16/ÂD28/n2/xÃgDCYcOf2/+f3/Ã47/f2/­BÂAIJhÂZ3/Ã4/P48/vÂ2ABhmHh4/n3/Ã47/PÆ/k2­AEAQPD3/Æf3/P47/Ã2/ÂEAY2B¾H3/x3/Ã48/x/+2­AzgECIP3/T2/Âf48/w/gI+fCAMwf2/ÇP2/j49/gA­ZPj+MA/gf2/kf/Âf49/ÆfJw/ÆwD/gP/+E/wH51/ísf/ÃgH/AP/w2AD52/EH2/nAf/AP/2AD52/Æx2/+e­A2/gHwAH53/jP2/2ÆD2/g2A54/+c3/xÂP2/2Af54­/Ãz3/HÂf2/AD55/OP2/ÆPÂ2/+Af54/ÆÂ3/yfx2/Æ­D55/zJ3/J/n2/Âf55/Iz2/Çn+D2/j55/ÆDn/HnfÆ­H/+f55/wfDwcÇ/wf/z56/B/AMHÁ/D/+P55/ÆH2/Ç­/fÆn/j56/wf3/Ç/zDwf56/B4/Á/OAH56/ÆH3/+fÆ­+D57/Âf3/Ã/z59/h4/n/P58/+H3/+fÆ59/ÆfÃ2/í/z59/w/Á2/n+f59/g3/+PÃ59/+Q+ADÂ/P59/Æw2A­BhÆ60/zÂf+AXn60/P3/BI60/+f3/AH60/Ã3/+B61­/x66/h66/D66/D66/H65/+f65/Æ6/x59/z5/ÆD59­/v5/gH58/+f4/+cP58/Ã5/zÂ59/Á5/Pj59/f4/ÃH­P58/Ç5/gM59/Á4/ÆAT59/P4/hwP58/+4/Æfg59/í4/j/D59/Á3/ÆfÆf59/v3/n/Ã59/+f2/Â2/j59/+f­2/P/+P59/Æf/x2/Æ60/ÆfÆf2/z60/ÆCH3/P65/+f­65/Ç66/Å66/j66/h66/ÃÂf65/h644/869A5/Æ60A­H6/g59Af6/w58AB7/Â58AH7/w58Af7/w57AB8/w2­ADwABÂABÂAH/+AHw3AfP4/gH3/hÂ2ABÂAP/ÆAD3/­ÂB4/Æ4AH8/g2AP2AHgAHgD2/+APg2ADÂ4/+B3/+H­g2AHgD2/ÆAP3/ÂH4/w4Af8/3AÆ2Ae2AeAf2/+A+3­APj4/ÂP3/Âe3AeA3/ÆA4/wf4/4AB6/Ã/+2ADwAB­ABÂD3/ÆBÆ2ABÆP4/h4/hÂ2ABÂH3/ÂD4/h4/Æ4AH6­/H/Æ2AP2AHgAHgf3/ÂHw2AHw4/+H3/+Hg2AHg4/w­P4/H4/w4Af5/Âf/Â2AÆ2Ae2AeB/wD/gPg2A+DÂ4A­fw3Ae3AeD/gH/AÆ2AfÆf8AB6/B2/wADwABÂABÂP­AB/A+2ADÂP4ABÂ3ABÂ2ABÂfwAD+Dw2AHxÂ8AH5/­H2/2AP2AHgAHg/2ABÆBÆ2AfAÆ4AHg3AHg2AHh+2A­DÂP3APHg8Af5/Af/+2AÆ2Ae2AeDÂ2AHwDwABÆDw4­Ae4Ae3AeHw2APgÆ3AÆe8AB5/ÂB2/ÆADwABÂABÂP3­APAPgAPgPg3ABÆ3ABÆ2ABÂe3AeDw2ADxÆ8AH5/AH­2/wAP2AHgAHgÆ3AÆAf2A+A4/AH3/wH4/hÂ2ABÂP3­APH3/Â5Af4/ÆAf2/gAÆ2Ae2AeDw2ADwBÆAHwD3/Æ­Af3/wf3/+Hg2AHgÆ3AÆf3/g4AB5/gB2/+ADwABÂA­BÂP3APADÂAfAP3/wB4/h4/Âe3AeDw2ADx3/+5AH4­/+AH2/ÆAP2AHgAHgÆ3AÆAPgDÂA4/AB3/+H4/hÂ2A­BÂP3AfH3/Â5AfAHwHÆAeD/wAÆ2Ae2AeDw2ADwAfA­PAD3/ÆAB3/Æf3/+Hg2AHg4/Æf3/g4ABÆAfAfgBwH­/ADwABÂABÂP4/ABÆBÆAP8APxÂ2ABÂf3/+D4/xÆ8A­HwBÆB+AHAf+AP2AHgAHg4/ÆADÂHgAÆ8AfHg2AHh4­/ÂP3/+Hg8AfAHgDÂBÆA/ÂAÆ2Ae2AeD4/wAPg+ADw­8AÆe3AeH4/g4/we8ABÆAOAPAPgD/gDwABÂABÂP4/­2AfDwAP8ADxÂ2ABÂf3/+D3/+BÂ8AHÂAÂAÆA+AH/A­P2AHgAHg4/ÆABÆf2AÆ8APHg2AHh4/ÂP2/fAHg8Af­gBABgDwAfÆAÆ2Ae2AeDw2AHwADÅÂADw8AÆe3AeHw­2AHgÆPÆ2Ae8AB+AEAGAf2A/wDwABÂABÂP3AP2APv­gAP8ADxÂ2ABÂe3AeDwfÂABÂ8AHÆAQAQBÂAD/AP2A­HgAHgÆ3AÆ2AfÆ2AÆ8APHg2AHhÂ2ABÂPAfÂAHg8Af­w4ADgAHÆA+2Ae2A+Dw2ADwAB/wADw7ABÆe3AeHg2­AHgÆA/wAe8AB/5AM2AfwD/ADÂA/ÂP3AP2AD+2APg­6AB/xÂ2ABÂe3AeDwA/wBÆ8AH+5AQ2A/AP6/gÆ3AÆ­2APÂ2A4/+P4/Hg2AHhÂ2ABÂPAB/gH4/w4AfÂ4AB2­ADÆA6/+Dw2ADw2Af2AD4/Âf3/Âe3AeHg2AHgÆAB/­Af4/4AB/g7AHwB6/wP3AP2ABÆ2AP4/h4/BÂ2ABÂe­3AeDwAD/B4/Æ4AD/6AEAfAB5/+AÆ3AÆ2ADg2A4/+­H3/ÂHg2AHhÂ2ABÂP2AH+H4/w4APÆAC4AYAÆ2A5/A­Dw2ADw2AM2AD4/Âf2/ÆAe3AeHg2AHgÆ2AH+f4/5A­/wAM3ADgDw55AD/gAw3APAP56AP+AHAB2AÆAÆ56A­/ÂAeAMAHgDw55AB/wBÂAÂAeAf56AH/APgDgDÂBÆ5­6AfÆA/AeAPAPw56A/ÂDÆBÆBÆA/2AB30Ag7AQ14AD­/gfwPwHgHÆ2AE29AC7AB15AH+B/g2/ÂAfw2AQ29A­I7AE15Af6/gD/2AB30Ag7AQ15A6/+AfÆAÆEeAHiE­wPg2A+IgABA+AeAfARÂOAHwACPADxCYHx3AIHxAY­Pg5AD6/ÂA/wGYWMBhoU2D2AGMiAgE2ODMHGBIzGB­hgALGAw¾KBxkQEAhxkeBhg5AH6/gD/AQBgYMBhgY­C2AgOIDA2g2IAwEHBwIMB2AwMGB2wMBxAY2EBxwI­D6AP5/+AfÆBAGAhg2GBAEAEAYQcCEAQgCAIYDA2g­CADAQwDCAgDCDgQgDGBgE6A6/ÂD/wEAQCEAIQMAQ­AwBhB2YQBiAYAhgIDGAIAIBCAEIGAMILCGAMYEAI­5AB6/gf/AMBAIQAhA2/gCA2CEhD/+GBADEAgMf/g­AgEIAQgQAQxkIQARgf/g5AD5/+D/ÆAYEAhACED3A­IA2Ij2M2AMEAMQCAxg2ACAQgBCBA2BERhABGB8AH­5/Âf/wAYQCEAIQI3AgAgyE2g2AMQAxAIDE3AIBCA­EIEA2EQkEAEYE8AP5/j2/2AhAIQAhAw2ADAGBQSB­3ARACEAgMQ3AgEIAQgYAwKCQYAxgQ8Af7/ÆACEAh­gGEBAEAEAYHAwEAQBCAIQCAwgCACAQwDCAgDAÂGA­gDGBgE7Af7/wAIQCDAYQGAgAIDgMDAIDAEM2BAI2­DAQAI2BgMIDAcBAYDAcYCAw7A8/AxhAI2GhA2M2A­ÂyAgMA2ÂYw2YEAgM2G2AgE2DQgHGQEBAHGRg2G8A­7/ÆAÂEAgHiEAPg2A+I4A+AcAfAQC2AHwACAQDxCA­Hx4AHx2APg8A7/w59A7/60AP5/Æ987A';
    
    let dec = t => t.replace(/(\d+)([^\d])/g, (m, r, c) => new Array(+r + 1).join(c)).replace(/[¾µÀÁÂÃÄÅÆÇ]/g, i => i.charCodeAt(0) % 10);
    
    console.log(dec(img).length);
    
  • Ahh - it's almost certainly a problem with the upload. There are some huge hacks in the Web IDE to try and add support for non-ascii characters in strings.

    Since you're only using base64, can you find a few extra characters that have character codes less than 128? http://www.asciitable.com/

  • I will give that a try.. i did choose the current characters so i can use that mod 10 hack :)

  • With the new encoding it doesn't crash.. but it returns after quite some while.. about 15min+ with a wrong result. Thats odd :D

    To not further spam this forum with huge encoded data i created
    a gist https://gist.github.com/PaddeK/4c717e9c2­a86a39efede7ac672c96f5b

  • Does the Espruino implementation of RegEx support the | "or" operator?

    Simplest test case:

    var p = /a|b/;
    p.test("a");
    

    Here I'm getting back false... [MDBT42Q 1v99]

  • No, I'm afraid it doesn't at the moment. I'll file an issue for it. You'll either have to use two regexes, or in the case you have there, use /[ab]/

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Regular Expressions!

Posted by Avatar for Gordon @Gordon

Actions