Make use of the segment chunk size to process samples
in contiguous blocks when possible.
Provide routines for downsampling with sample sizes
of 1, 2, 4 & 8 bytes which the compiler can optimize.
These changes halve the time taken to display the
23MB ws2812b_neopixel24_4mhz_verylong.sr from
40 seconds to 18 seconds.