You are reading a single comment by @Dennis and its replies. Click here to read the full conversation.
  • For a while I have played around with streaming waveforms for audio output ("streaming" means having two buffers and continuously re-filling one while the other is playing). Here are my findings.

    I first streamed from an SD card. Now on the Pico, I stream from an external SPI flash memory, using my home-grown Winbond W25Q module. Comparing the two methods, I can say that streaming from flash works better.

    • Buffers can be much smaller. For streaming at 11 KHz from the SD, I needed a buffer size of 2048 samples. When streaming from the SPI flash, 64 samples are fine.

    • While the highest sampling rate I achieved with SD card was between 12 and 15 KHz, I can stream at 44.1 KHz from the flash RAM.

    I'm not sure why exactly this is, but I assume that there is a considerable overhead for dealing with the filesystem on the SD card. I also found that reading sequentially from the flash works like a charm, while re-positiong (i.e. moving the read pointer to a different memory address) is much more likely to produce glitches, so that seems to take more time.

    It seems that reading from the SD takes quite some fixed amount of time (plus the variable time depending on the chunk size), which would explain why buffers need to be so much bigger.

    Another interesting find is that (at least for audio) glitches can be more acceptable with flash memory, because of their pattern (length of the glitch and interval of occurrence). When streaming from SD with big buffers, you get glitches which are typically tens of milliseconds long (so they sound wrong and disturbing), and they happen a few times per second, which is also an annoying interval. When streaming from flash with small buffers, you get very short glitches (sounding like a click) but so often that they are no longer perceived as individual glitches, but as a background hum with a constant frequency. You can tune that frequency (sampling rate divided by buffer size) and you can also tune its amplitude (trade-off against performance). Audio quality at 8bit is not great anyways, and a little crackling well below the volume of the payload signal is hardly audible, so you can choose to accept controlled buffer underruns for the benefit of less memory impact or higher sampling rate.

    Now what's interesting is that (regardless of the streaming source) certain combinations of sampling rates and buffer sizes seem to work well while others don't. One would expect that increasing the buffer size would generally improve performance (i.e. lessen glitches due to buffer underruns), but that is not always the case. On the contrary, I even found that bigger buffers can make it worse. It's not that hard to find a working configuration for a given application through experimentation, nevertheless, it would be interesting to know why Espruino behaves that way. I can imagine that the time needed for a buffer-refill is not a linear function of the buffer size, but a more staircase-shaped one, but that's just speculation. Does anybody know more?

About

Avatar for Dennis @Dennis started