Multi-threading?

Posted on
  • Would it be possible for a function to have a point where it could "yield" the CPU for other functions to execute? I added a quick sort function to Uint16Array which takes about 15 seconds to execute which breaks some timings when setInterval() are too short.

    I take the "yield" expression from Lua which implements multi-threading as coroutines, see: http://lua-users.org/wiki/CoroutinesTuto­rial

  • The issue really is stack space. As the parse tree is effectively on the stack, having two things running concurrently is likely to fill it up pretty quick. You could just rewrite the quicksort to jump out early.

    Can you post your quicksort? I'd like to add it as a benchmark.

    The best solution here would be to implement the quicksort natively though. It should be quite easy (and fast) with iterators and I imagine it would be used a lot.

  • I forgot that you would need to instantiate a second parsing tree.

    I attached the tests I made with sorts. Here is the output:

    start: 162142.4207250000035855
    qsort: 162142.9275999999954365 (0.506874999991850927472)
    bsort: 162144.6809749999956693 (1.753375000000232830643)

  • Attaching did not work, I'll just paste it inline:

    // Useful functions
    Array.prototype.median = function () {
      return this[Math.floor(this.length/2)];
    };
    Uint16Array.prototype.median = Array.prototype.median;
    
    Array.prototype.swap = function (left, right) {
      var tmp = this[left];
      
      this[left] = this[right];
      this[right] = tmp;
      
      return this;
    };
    Uint16Array.prototype.swap = Array.prototype.swap;
    
    Uint16Array.prototype.clear = function () {
      for (var i = 0; i < this.length; i++) this[i] = 0;
    };
    
    // Bubble sort
    Array.prototype.sort = function () {
      var len = this.length - 1;
    
      for (var i = 0; i < len; i++) {
        for (var j = 0, swapping, endIndex = len - i; j < endIndex; j++) {
          if (this[j] > this[j + 1]) {
            swapping = this[j];
            this[j] = this[j + 1];
            this[j + 1] = swapping;
          }
        }
      }
      return this;
    };
    Uint16Array.prototype.sort = Array.prototype.sort;
    
    // Quick sort
    Array.prototype.qsort_partition = function (pivot, left, right) {
      var tmp, storeIndex = left, pivotVal = this[pivot];
    
      this.swap(pivot, right);
      for (var i = left; i < right; i++) {
        if (this[i] < pivotVal) {
          this.swap(i, storeIndex);
          storeIndex++;
        }
      }
      this.swap(right, storeIndex);
    
      return storeIndex;
    };
    Uint16Array.prototype.qsort_partition = Array.prototype.qsort_partition;
    
    Array.prototype.qsort = function (left, right) {
      var pivot, newPivot;
    
      if (left === undefined) {
        left = 0;
        right = this.length - 1;
      }
    
      if (left < right) {
        pivot = left + Math.ceil((right - left) / 2);
        newPivot = this.qsort_partition(pivot, left, right);
        this.qsort(left, newPivot - 1);
        this.qsort(newPivot + 1, right);
      }
    
      return this;
    };
    Uint16Array.prototype.qsort = Array.prototype.qsort;
    
    // Tests
    var tQsort = new Uint16Array([5454,5449,5380,5412,5380,53­66,5344,5395,5398,5424,5422,5473,5420,54­32,5376,5354,5561,5288,5393,5388,5422,54­27,5476,5407,5385,5180,5363,5324,5395,53­93,5410,5405,5349,5361,5385,5412,5373,53­73,5478,5420,5446,5395,5339,5407,5420,53­56,5336,5427,5459,5378,5336,5349,5420,54­05,5434,5383,5446,5422,5349,5329,5405,54­34,5446,5336,5427,5473,5402,5170,5388,54­12,5456,123,456,789]);
    var tBsort = new Uint16Array([5454,5449,5380,5412,5380,53­66,5344,5395,5398,5424,5422,5473,5420,54­32,5376,5354,5561,5288,5393,5388,5422,54­27,5476,5407,5385,5180,5363,5324,5395,53­93,5410,5405,5349,5361,5385,5412,5373,53­73,5478,5420,5446,5395,5339,5407,5420,53­56,5336,5427,5459,5378,5336,5349,5420,54­05,5434,5383,5446,5422,5349,5329,5405,54­34,5446,5336,5427,5473,5402,5170,5388,54­12,5456,123,456,789]);
    
    var qsortTime, bsortTime, startTime = getTime();
    tQsort.qsort();
    qsortTime = getTime();
    tBsort.sort();
    bsortTime = getTime();
    
    print("start: " + startTime);
    print("qsort: " + qsortTime + " (" + (qsortTime-startTime) + ")");
    print("bsort: " + bsortTime + " (" + (bsortTime-qsortTime) + ")");
    
  • I tried optimizing the function, removing functions call seem to help quite a bit, and prevent stack overflows just a bit since it is easy to get one.

    qsort: 165526.5649749999865889 (0.398799999995389953255)

    swap and qsort_partition not needed anymore:

    // Quick sort
    Array.prototype.qsort = function (left, right) {
      var pivot, newPivot, pivotVal, tmp;
    
      if (left === undefined) {
        left = 0;
        right = this.length - 1;
      }
    
      if (left < right) {
        pivot = left + Math.ceil((right - left) / 2);
    
        // Partition
        pivotVal = this[pivot];
        newPivot = left;
        this[pivot] = this[right]; this[right] = pivotVal;
        for (var i = left; i < right; i++) {
          if (this[i] < pivotVal) {
            tmp = this[i]; this[newPivot++] = i; this[i] = tmp;
          }
        }
    
        this.qsort(left, newPivot - 1);
        this.qsort(newPivot + 1, right);
      }
    };
    Uint16Array.prototype.qsort = Array.prototype.qsort;
    
  • And here is my final code for my needs, this won't stack overflow:

    nqsort: 425.1626749999999788087 (0.343325)

    // Non-nested quick sort
    Array.prototype.nqsort = function(depth) {
      var pivot, i=0, left, right;
      if (depth === undefined) {
        depth = parseInt(Math.floor(this.length/5), 10);
      }
      var begin = Uint16Array(depth);
      var end = Uint16Array(depth);
    
      begin[0] = 0;
      end[0] = this.length;
    
      while (i >= 0) {
        left = begin[i]; right = end[i]-1;
        if (left < right) {
          pivot = this[left];
          if (i === depth-1) return false;
          while (left < right) {
            while (this[right] >= pivot && left < right) right--;
            if (left < right) this[left++] = this[right];
            while (this[left] <= pivot && left < right) left++;
            if (left < right) this[right--] = this[left];
          }
          this[left] = pivot; begin[i+1] = left+1;
          end[i+1] = end[i]; end[i++] = left;
        } else i--;
      }
      return true;
    };
    Uint16Array.prototype.nqsort = Array.prototype.nqsort;
    

    References: http://alienryderflex.com/quicksort/

  • This is great - thanks! I'll stick these in the benchmark suite - they should be really useful to help work on function call/array access overhead.

    By the way, if you want a little more speed, you can minify the function:

    Array.prototype.nqsort=function(d){var f,c=0,a,b;void 0===d&&(d=parseInt(Math.floor(this.lengt­h/5),10));var g=new Uint16Array(d),e=new Uint16Array(d);g[0]=0;for(e[0]=this.leng­th;0<=c;)if(a=g[c],b=e[c]-1,a<b){f=this[­a];if(c===d-1)return!1;for(;a<b;){for(;t­his[b]>=f&&a<b;)b--;for(a<b&&(this[a++]=­this[b]);this[a]<=f&&a<b;)a++;a<b&&(this­[b--]=this[a])}this[a]=f;g[c+1]=a+1;e[c+­1]=e[c];e[c++]=a}else c--;return!0};
    
  • Minifying is nice, bumped on github a javascript string compressor http://pieroxy.net/blog/pages/lz-string/­demo.html which compress the minified (on a PC the 830 bytes becomes 322 bytes net).
    The original code is 1788 bytes compressed to 498 bytes.
    Serial links are slow, so the MCU got time to decompress with this kind of option.
    More bang for the memory buck, and probably more throughput in serial links?
    It also makes the original JS code probably closer to a compiled C code, which would be used for speed?

  • Looks handy - it would be good to have as an Espruino module...

    To be honest the USB link is more than fast enough for sending code, and unless I can make Espruino execute direct from the compressed strings I wonder how much memory this would save on Espruino itself.

    However being able to compress data could be very handy - especially when doing things like sending it over the internet or storing it on EEPROMs.

  • That's right. Actually, when I looked at Tessel's add-on board animation which gets inserted on the MCU board, it reminded me of the impractical USB custom devices: when plugged in the first time: It's plug and pray because compiled drivers need to be found (on top of being admin)... for the host (win, iOS, Android, Linux...)

    Would developers' life be easier if each "shield" comes with an EEPROM which contains the javascript drivers and web pointers (where to get spec, upgrades etc...), so the platform (whatever the MCU) can use it right away, like dynamic added resource? This would justify interpreted language implementation for portability.

    By the way, surveying around, got some questions:

    • What are the pros and cons of embedded Javascript vs eLua (which could be precompiled)?
    • Can Espruino's interpreter get extensions that would allow specific boost of performance for MCU embedded applications? (because maybe supporting type strong C var types to complement the flex var type of Jscript?)
  • Can Espruino's interpreter get extensions that would allow specific boost of performance for MCU embedded applications? (because maybe supporting type strong C var types to complement the flex var type of Jscript?)

    How about inline assembler? http://www.espruino.com/Assembler

  • Assembly is not portable across cores (not future proof): C helps to be core agnostic. Few years back we learnt to optimize the way we code in C so the compiler is generating efficient machine code. I try to use only the NOP instruction for basic S/W delays. Javascript evolves and adopt what is popularly implemented new features in browsers. How about Embedded JavaScript? (no V8 engine to compete with eLua's performance)

  • What are the pros and cons of embedded Javascript vs eLua (which could be precompiled)?

    It's a language that people actually know, there's lots of code available on the net, and you don't need a toolchain on the host. It's almost certainly not as fast, but if you were after raw speed and didn't mind pre-compiling then you'd probably be using C.

    maybe supporting type strong C var types to complement the flex var type of Jscript?

    Without a JIT compiler I don't think that will help a great deal - if you want a fast pre-compiled language that looks a lot like C, I'd suggest that you use C.

    Assembly is not portable across cores (not future proof)

    This is true, however I think it's safe to assume that every Espruino device for the next few years at least will run ARM Thumb, so you're pretty safe.

    There are instructions on the link @DrAzzy posted that show you how to use C code in Espruino. Ideally the Web IDE would directly support 'inline C' - but I'm yet to find a compiler that can be compiled to JavaScript that targets thumb. If someone wants to port TCC (which can be compiled with emscripten) from ARM to ARM Thumb then I'd be only too happy to directly include inline C :)

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Multi-threading?

Posted by Avatar for TiCPU @TiCPU

Actions