Fastest IO toggle

Posted on
Page
of 2
/ 2
Next
  • First I want to say hello and thank the developers for a very cool little board. Post man delivered a Pico to me the day before yesterday. One of the first things I do is figure out the fastest ways to toggle an IO pin.

    Most methods produced something between 2.7 and 5.1 Khz.

    This is the fastest I could come up with that produces 37.31 Khz. The only other method that I have not tried would be to duplicate the code below with inline assembly.

    var start = new Date();
    function toggle(){
      "compiled";
      var A = digitalWrite;
      var B = B2;
      for (var i=0;i<10000;i++){
        A(B,1);
        A(B,0);
      }
    }
    toggle();
    var end = new Date();
    var time = end-start;
    time = (10000/time);
    console.log(time.toFixed(2)+" Khz");
    

    This may be of no use to anyone else but it satisfies my personal curiosity.

  • You should move var start below the definition of toggle. Now you'll be closer to the real number. Final step is to actually measure the toggle with an oscilloscope :)

  • You are correct. I should move start. It will be put on the work bench and tested when time permits. So far I have only been able to pay with the pico on my phone using OTG cable and briefly from my laptop last night.

    I will update here if there is a significant change from moving the start.

    Update:
    Moving var start brought it up to 38.7 Khz.

    function toggle(){
      "compiled";
      var A = digitalWrite;
      var B = B2;
      for (var i=0;i<10000;i++){
        A(B,1);
        A(B,0);
      }
    }
    var start = new Date();
    toggle();
    var end = new Date();
    var time = end-start;
    time = (10000/time);
    console.log(time.toFixed(2)+" Khz");
    

    Update:
    33.28Khz

  • Hey, thanks for the post - that looks good! I can't think of any other obvious ways to make it significantly faster. At some point it'd be nice to have a 'fast path' for GPIO in compiled code, but it's not really that high up the priority list.

    I guess the other (slightly cheaty) options are:

    • Use the inline assembler and access the registers for B2 directly. I imagine that could hit in the region of 10Mhz.
    • Use SPI and send the bit pattern 01010101.
    • Use PWM via analogWrite
  • I always start with native code and then try to find what works and what doesn't. For example, I learned that having the for-loop outside of the compiled function was significantly slower. May try PWM just to see what the upper limit is, but not SPI. Certainly will give assembler a shot.

    With nodeMCU and lua I was able to achieve 60Khz with CPU speed at 80Mhz, and 120Khz at 160Mhz but it could not be sustained without it rebooting itself.

  • Re: SPI, you don't need to send any specific bit pattern, just send whatever and use the clock line ;-)

  • @DrAzzy
    Thats not really what I am trying to achieve with my little speed tests. ;-)

    I am wondering if there is a step between native and assembly. Would it be possible to poke32 the IO address and achieve toggle? Anyone tried it?

    Its been nearly 30 years since I did assembly and I am not sure I want to go down that rabbit hole again.

  • Actually ARM assembler isn't that painful. Most of it is done for you already here under 'Accessing IO' and 'Loops', so it's almost copy+paste.

    But yes, you could poke the IO address. However at the moment that's still a function call via the JS interpreter which will take quite a bit of time (it should be faster than digitalWrite though) - If I changed the compiler to handle peek and poke with known values as low-level operations it'd be crazy fast though. It's just finding time ;)

    Another thing to try if you're playing around is bind. You could do:

    var A = digitalWrite.bind(undefined,B2,1);
    var B = digitalWrite.bind(undefined,B2,0);
    // ...
    A();
    B();
    

    or

    var A = B2.set.bind(B2);
    var B = B2.reset.bind(B2);
    // ...
    A();
    B();
    

    I haven't tried, but they could be a smidge faster.

  • I figured peek and poke were js calls but it should be the next step with assembly being the fastest but most complicated.

    I will give bind a test when time permits.

  • ... just had a play at adding that peek/poke fast path. Try this now:

    function foo() {
      "compiled";
      var GPIOB = 0x40020418; // BSRR register
      var PIN = 1 << 2;
      var cnt = 10000000;
      var t = getTime();
      for (var i=0;i<cnt;i++) {
        poke32(GPIOB, PIN); // on 
        poke32(GPIOB, PIN<<16); // off
      }
      console.log((cnt / (1000000*(getTime()-t))) + " Mhz");
    }
    
    LED1.set(); // set up as output first
    foo();
    

    And for me it reports 7.7Mhz, which is just a bit faster :)

  • Holy crap! Just a little bit faster.

  • Have you looked at it through an oscilloscope yet? :) I'll have a go tomorrow!

  • I will tonight when I get home.

    Should be able to post screen shots as well.

  • DSO says 7mhz while the pico calculates it as 8.062mhz

  • @Gordon
    Using your bind method the pico reports 22.28 Khz and the DSO reports 19.38khz.

  • Trying my hand at assembler but I am getting an error. Trying to toggle B3.

    var test = E.asm("void()",
                     "ldr  r2,gpiob_addr",
                     "movw  r3,#4",
                     "loopStart:",
                     "str  r3,[r2,#0]",
                     "str  r3,[r2,#4]",
                     "bgt  loopStart",
                     "bx  lr",
                     "gpiob_addr:",
                     ".word  0x40020418"
                     );
    

    Error is...
    Assembler failed: Invalid number "gpiob_addr" must be between 0 and 1024 and a multiple of 4

  • Great! The Pico's timing runs off its internal 32k RC oscillator, and unfortunately it's really not that accurate - probably why the difference is reported. I've been hoping to do something about that, but I'm not 100% sure on the best way around it yet.

    The assembler issue is a fun problem with Thumb assembler - instructions are 2 bytes, but often immediate values need to be multiples of 4. You just need to add nop to align the data correctly:

    var test = E.asm("void()",
                     "ldr  r2,gpiob_addr",
                     "movw  r3,#4",
                     "loopStart:",
                     "str  r3,[r2,#0]",
                     "str  r3,[r2,#4]",
                     "bgt  loopStart",
                     "bx  lr",
                     "nop",
                     "gpiob_addr:",
                     ".word  0x40020418"
                     );
    
  • Ok. Will test this when I get home from my day job. Maybe I will do some research and put it in a for loop.

  • Ok :)

    As-is that may not work - bgt is branch if greater than - I guess you either intended to do a for loop in assembler, or an unconditional branch - b I think.

    You might find out that even with a loop in assembler, the 'compiled JS' is actually faster. GCC is really pretty good at optimising the C code that's generated :)

  • You are correct. I was thinking for loop. Its been a looooong time since I did assembly.

  • Is the RC oscillator really that bad?

    That's >10% off...

  • I tried but I think its triggering so fast that it never comes fully on. Readings are very low and distorted. Probably need to add a delay.

    var test = E.asm("void()",
                     "ldr  r2,gpiob_addr",
                     "movw  r3,#4",
                     "loopStart:",
                     "str  r3,[r2,#0]",
                     "str  r3,[r2,#4]",
                     "b  loopStart",
                     "bx  lr",
                     "nop",
                     "gpiob_addr:",
                     ".word  0x40020418"
                     );
    
  • I'm not 100% sure about the writes you're doing... You need to write 4 to turn it on, but 4<<16 to turn it off (it's a bit set/reset register). I'm not sure that second store is actually turning it off?

    @DrAzzy yes, sadly it really is that bad - but on a device by device basis. If it were calibrated for each board then it'd be Ok, but obviously I can't do that very easily. I was looking into a way to calibrate it on the fly against the high speed oscillator, but I haven't had much luck with that to date.

  • I tested those writes individually and they do turn pin B4 on and off.

  • Does this mean that time-keeping will be off by ~10%?
    Like, setInterval/setTimeout will be that far off?

    If that's the case, those itty-bitty super-special crystals are a necessity not an option. +/- 10% means it could lose like 2.5 hours a day...

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Fastest IO toggle

Posted by Avatar for cwilt @cwilt

Actions