Benchmarking

Posted on
  • I don't know if you've seen at all, but MaximumOctopus has been doing some benchmarks of code on various microcontrollers.

    I thought it might be interesting to see about simple things that can be done to increase the speed, so:

    SPI

    The original code is this:

    SPI1.setup({ mosi:B5, sck:B3 });
    var output = 0xAAAA;
    
    function loop()
    {
      digitalWrite(B8, LOW);
      SPI1.write(output>>8, output);         // send high 8 bits, low 8 bits
      digitalWrite(B8, HIGH);
    }
    

    And it takes 0.816ms in 1v73.

    However, if you go into Settings, Communications, and set Minification:

    • Whitespace Only : 776ms
    • Simple Optimisation: 776ms
    • Advanced Optimisation: failed - closure compiler renamed SPI1.write

    So, anything else? Leaving 'simple optimizations' on, you could:

    • Store the output data as an array: 0.688ms
    • Then, don't use HIGH and LOW constants, just 1 and 0: 0.64ms
    • Maybe use B8.set instead of digitalWrite: 0.592ms
    • Or use B8 as SPI 'CS' (SPI1.write(output, B8);): 0.24ms
    • And ramp SPI baud up to 4000000 : 0.044ms
    • Then store the two bytes as a string: 0.025ms

    Finally:

    SPI1.setup({ mosi:B5, sck:B3, baud:4000000 });
    var output = "\xAA\xAA";
    
    function loop()
    {
      SPI1.write(output,B8);
    }
    
    setInterval(loop, 50);
    

    Sadly the compiler (see below) doesn't handle method calls yet, so I can't show anything with that.

    Even so, that's now faster than all but the Spark Core and Arduino Uno when using hardware SPI.

    Loops

    Original code (for local variable) is basically this - modified just to use an LED:

    var variable = 0;
    
    function loop()
    {
      digitalWrite(LED1, HIGH);
    
      for (var counter = 0; counter<1024; counter++)
      {
        variable++;
      }
    
      digitalWrite(LED1, LOW);
    }
    
    setInterval(loop, 1); // 100 milliseconds (0.1 seconds)
    

    This takes 270ms.

    However, if you go into Settings, Communications, and set Minification:

    • Whitespace Only : 250ms
    • Simple Optimisation: 225ms
    • Advanced Optimisation: 206ms

    Then, there's the (kinda experimental) compiler. Same code, but with the 'compiled' keyword:

    var variable = 0;
    
    function loop()
    { "compiled";
      digitalWrite(LED1, 1);
    
      for (var counter = 0; counter<1024; counter++)
      {
        variable++;
      }
    
      digitalWrite(LED1, 0);
    }
    
    setInterval(loop, 300); // 100 milliseconds (0.1 seconds)
    

    That now takes 97ms.

    However, what now slows it down is the variable lookup for variable. You can:

    • Rename variable to v - 84ms
    • Make variable local - 62ms

    The repeated lookup of variable is something I hope to fix in a later version of the compiler. There's no good reason for not looking it up in the symbol table once at the start of the function, so when that's done you could expect it to take 62ms.

    That's also pretty interesting because Tessel can now run the above benchmark in 25ms. Given it's running at 180Mhz vs. Espruino's 72Mhz, it means that with the compiler, in that example, Tessel and Espruino are executing the code in about the same number of clock cycles.

    Hopefully with a bit more work on the compiler (type inference for integers) Espruino will actually start to overtake Tessel (even with the lower clock).

  • Highlights the fact that Espruino interpreter lives of the source... the shorter the source, the faster the execution (even if it just spaces that have to be skipped) - and resolving variable names is obviously a heavy hit... and even heavier when global. Therefore, a comparison of a variable against undefined versus a local, short-named variable of value undefined is expected to be faster, is it? Wondered also about the this in the object-oriented use of javascript: does it qualify for 'short variable'?

    Interesting question: What executes faster ?

    • a) if with large then and else block
    • b) two function/method definitions and and an if invoking either one of those two functions.

    Since the code is in the RAM, some minimal JIT could be done... may be there is already something done...

  • This is facinating!

    What method do you use to record the times?

  • Well, yes - some JIT could be done, but it's never going to make Espruino 'fast'. I think Tessel is a good example here - they compile to Lua and now run in LuaJIT which is known to be one of the faster and more efficient embedded interpreters... And yet it's actually still nowhere near as fast as native code, which isn't that surprising when you think about it.

    That's why I think the compiler is going to be really exciting. By doing all the heavy lifting on the PC you can make code that has the types of variables inferred and that is properly optimised and fast. But you can do that without having to have any overhead on the device itself.

  • @DrAzzy I did what MaximumOctopus did and looked at the pin states with an oscilloscope. In the case of SPI, the speed of execution of the digitalWrites takes a lot of the execution time.

  • How do our results compare to MicroPython? https://github.com/micropython/micropyth­on/wiki/Performance

  • It won't be anywhere near as good, as Micro Python does a certain amount of JIT compilation, and I believe also runs at twice the clock speed. There's some info on performance here: http://www.espruino.com/Performance

    function performanceTest() {
        var secs = getTime;
        var endTime = secs() + 10;
        var count = 0;
        while (secs() < endTime)
            count++;
        print("Count: "+ count);
    }
    performanceTest();
    // Count: 92990
    

    However if you use the Web IDE to pre-compile the function (which only works in some limited cases, but this is one of them):

    function performanceTest() {
        "compiled";
        var secs = getTime;
        var endTime = secs() + 10;
        var count = 0;
        while (secs() < endTime)
            count++;
        print("Count: "+ count);
    }
    performanceTest();
    // Count: 590702
    

    It's also a little unfair, because getTime in Espruino is accurate to the nearest microsecond and also uses floating point. Doing that requires a certain amount of calculation, which will be slowing Espruino down compared to Micro Python.

    But honestly, it's not what Espruino is about. It's about speed of development vs speed of execution, while still being able to run on relatively low-end hardware.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Benchmarking

Posted by Avatar for Gordon @Gordon

Actions