Audio streaming (part I): A practical take on latency

This series focuses on audio streaming for audio production, which differs from streaming audio from your music collection to a wireless speaker in that the quality and latency requirements are much different. When you stream a song via AirPlay, for example, it takes about two seconds to begin, which is acceptable when listening to music, but not at all for production. There are several topics to cover, so we have decided to talk about it in several posts. This post will talk about streaming latency: the time it takes the audio to get to its destination.

The most common question we’ve gotten since Audreio was announced has been What’s the latency like? It seems to be the Achilles heel of audio streaming (particularly wireless streaming) and the answer to it is generally vague at best. Audio streaming is the core of Audreio and understanding some of the intricacies can prove useful to get the best performance.

We all agree that the faster it takes the audio to get to the other side, the better (at least for recording or live performance), so I will begin by talking about the things that significantly affect audio streaming performance. Keep in mind that the list below is intended to serve as a practical guide to reduce latency, and it is by no means a complete dissection of audio streaming and its relationship to latency.

Sender’s buffer size — the sending device’s audio buffer size is the first place where the audio data is held, until the buffer is full, before being handed down to Audreio. For example, if the buffer size is set to 1024 samples, it takes approximately 23ms (@44.1 kHz) before Audreio gets the first chunk of data.
Receiver’s buffer size — the receiver’s audio buffer size can take up to the number of samples in the buffer for the host to ask for the audio data. There are different ways the streaming engine can minimize the effects of the receiver’s buffer size on latency, yet it is still an important factor to consider.
Transmission jitter (and a few other things)—if the time it took for the data to get from the source to the destination were predictable (and constant) then there would be no added latency. The problem is that when this is unknown it is necessary to “preload” enough data to account for the longest delay so that audio keeps playing uninterrupted. This is the main problem with audio over a network (which AVB solves with custom hardware), and especially over Wifi. Wireless signals are particularly prone to inconsistent transmission times because of the many factors that cause delays or loss of data. The important thing to understand here is that the higher the transmission jitter the higher the latency required to keep the audio glitch-free.

To explore the effect those parameters have on latency by plugging in some real world numbers. Lets say I’m recording audio from my iPad into Logic and the synth on my iPad likes to run at a buffer size of 1024 samples (~23 ms). If we add 4 buffers of “padding” to account for transmission jitter (4 x 23 = 92 ms), lattency will add up to about 115 ms, and that’s before accounting for some other “less significant” offenders (which we don’t discuss in this post because either they are negligible or out of the end user’s control). Now, lets assume that the the synth can run with buffers of 256 samples: ~6 ms + 4 x ~6 = ~30 ms (this assumes that jitter is lower than 24 ms). In practice, we can get latencies in mid-30’s to mid-40’s over a good Wifi network, and in the twenties using the device cable.

We can conclude that using low buffer sizes and minimizing transmission jitter will help get low latencies. We recommend using buffer sizes not larger than 256 samples both on the receiver and the sender. Will smaller buffer sizes reduce latency even further? They would, although the gains start becoming less significant and at the expense of a lot of processing power. In Part II of this series we will talk about specific ways to improve streaming performance (i.e. minimize transmission jitter) which translates into cleaner audio at low latencies.

Audio streaming (part I): A practical take on latency

Written by Jorge Castellanos : @piticfericit

Leave a Reply to Din McHine Cancel reply