Thanks - yes! Multi-byte transfer isn't as easy as just refactoring to use the call, as memory isn't guaranteed to be in one flat area. If you have to allocate a flat area of memory and then copy data into it then it'll actually take longer on most platforms - it's just that the time will all be concentrated before the transmission starts rather than inbetween bytes.
Realistically we'd be best off going straight for a function that sent SPI via DMA. We could then boost the standard SPI.send where possible, keep neopixel.write the same for all platforms, and add a new SPI.sendAsync or something that'd expose the fast functionality.
If graphics drivers could do that then the next chunk of data could be prepared while the current one was sending.
