performance optimization & inlining

Posted on
  • I'm concerned about the performance of the esp8266 port in particular. Basically JS execution in general is very slow. I believe I disabled all inlining and have to compile with -Os in order not to have the code size explode. I'm wondering whether there is a top-10 set of functions (probably from jsvars.h) that should be inlined? I would then go one by one and see what I can either inline or move to static IRAM. Also, do you have a favorite piece of code to "benchmark" execution performance?

  • I've got the code in the benchmarks directory that I've been using - they're not perfect though.

    Are you compiling with RELEASE=1? Without it, the asserts are left in which really kills performance.

    The stuff marked ALWAYS_INLINE is worth going for first. Specifically jsvGetAddressOf Lock, UnLock, Ref, UnRef, Get/SetXXXChild/Sibling are all very small functions with big improvements from inlining (in fact they shouldn't increase code size much at all).

    Same for jsvIsXXX - in fact by inlining those, the compiler should be able to remove some of the null checks.

    Perhaps you could look at having two inline defines, ALWAYS_INLINE and MAYBE_INLINE or something?

    Espruino itself really isn't that fast though - or does ESP8266 look bad in comparison with other STM32 based boards?

  • the esp8266 seems anemic indeed. I have the asserts in there. I'm reluctant to take them all out with RELEASE=1 but I understand. Sounds like I need to experiment a bit. Thanks for the pointers!

  • The asserts do impact the speed (and code size!) a huge amount - I don't include them in any Espruino builds if that makes you feel any better :)

  • I just did some experiments and measured the time it takes to update a display, so lots of i2c operations involved. I got a 1.6x speed-up by removing the asserts (RELEASE=1). [Update: the next sentence is incorrect, see next post] The esp8266 clock rate (80 vs 160 Mhz) has almost no effect because I'm pretty sure the execution rate is dominated by flash speed (another reason not to use the esp-01 modules 'cause they have a 40Mhz flash chip).

    Switching from -Os to -O1 makes the firmware bigger and slower. Switching from -Os to -O2 makes the firmware too big. Switching -DLINK_TIME_OPTIMIZATION on, which really just adds force-inlining pragmas in jsutils makes the firmware bigger and slower. It turns out that -Os has all the optimizations of -O2 that don't tend to increase code size. So the bottom line is that GCC seems to be the smartest about what to inline and what not to...

    (I want to run some of the benchmarks too)

  • So I was wrong about the 160Mhz. I must have messed-up and fortunately did some more experiments. The higher clock rate does have an effect, so I think I'm going to adopt is as default. For details on the performance see https://github.com/tve/EspruinoDocs/blob­/master/boards/EspruinoESP8266.md#perfor­mance

  • Hmm, I'm very surprised about the Link Time optimisation (unless you enable it by default even without the inlining flags?)

    I did choose the stuff to inline based on benchmarks on STM32, but I guess on ESP8266 the balance might be more in favour of code size?

    There's actually a compiler option called inline-unit-growth I think? You might have some luck with -O3 and a low-ish value for that...

  • I always have link-time-optimization enabled. The only effect the -DLINK_TIME_OPTIMIZATION flag has on the esp8266 is to make the ALWAYS_INLINE macro do something. I think overall I'm pretty happy with the -Os I have now. Maybe there are some better combos, but I suspect that any code size increase makes things worse...

  • Yeah... I'd be interested to see whether things like jsvGetNextSibling actually get inlined. They're so simple that hopefully GCC should do it automatically.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

performance optimization & inlining

Posted by Avatar for tve @tve

Actions