[music-dsp] who else needs a fractional delay.
rossb-lists at audiomulch.com
Sun Nov 21 06:10:22 EST 2010
robert bristow-johnson wrote:
> On Nov 20, 2010, at 4:46 PM, Ross Bencina wrote:
>> I'm implementing a low-latency audio-over-wi-fi system with UDP
>> transport. The packet period is somewhere between 5 and 30ms. I'm doing
>> clock-recovery on the client to keep the buffering in sync. Since it's a
>> low latency system I can't afford to have more than the minimum required
>> buffering - so getting the playout rate correct is important.
>> I've already implemented a prototype/simulation of most of the
>> mechanisms. I'm using a PI controller for the servo mechanism with a
>> feedforward path for (smoothed) playback rate (so the servo only really
>> needs to deal with correcting offset errors). I havn't tuned it yet, but
>> the simulation results look OK. I found the wikipedia article on PID
>> controllers pretty helpful: http://en.wikipedia.org/wiki/PID_controller
>> Would you recommend something other than a PI controller for this?
> i don't think you want any D, but you probably want some P and I.
Yeah, that's why I said "I'm using a PI controller"
> one thing to remember, because this becomes the rate input to an NCO
> (essentially the output pointer address, which has a fractional
> component), there is an inherent integrator in your "plant" (using
> control systems lingo :). the plant and the controller are essentially
> in series, so the integrator in the plant teams up with the PI making it
> I and I^2, instead. maybe you *do* want a D. i dunno. but i'm pretty
> sure you don't need an I, because your controller P already is your I.
Interesting, I hadn't thought of that, thanks. So if P is already my I,
what's my P? or is that why you're saying I might want D...
>>> From my point of view the more difficult thing is recovering a stable
>> wordclock from a jittery packet stream -- and getting this to start up
>> quickly enough to be useful. In the past I've used an Ordinary Least
>> Squares regression on packet timestamps to estimate the incoming sample
>> rate and intercept (time offset).
> whoa! i wonder how that is incorporated in this?
> i'm only thinking about how a difference signal between two pointers in a
The problem is that there can be a huge amount of jitter on the audio
packets. Let's say the packets have a 29ms period -- the jitter caused by
the network (especially with Multicast traffic over WiFi) can be up to
50ms... so asking the servo mechanism to smooth out all of that jitter is
asking quite a lot... especially since a P(I)(D) mechanism has no inherent
model of system or measurement error. you could heavily damp the system and
I suppose it might stabilise eventually but I need more stability and faster
convergence than that.
So I have two phases:
1. The network packets come in and I timestamp them and run some kind of
robust clock recovery procedure (OLS, Kalman, whatever) to determine the
incoming sample rate and phase/offset.
2. Take this stabilised rate and offset information and feed it to the PI
controller to adjust the playout rate.
I might be able to do it all in one step with some kind of fancy controller
that I have no idea about right now, but I my intuition is that it's best to
recover a stable incoming sample clock first and feed this into a PI
controller (ie a mostly feedforward structure) rather than having a feedback
system act on such massively jittery phase information.
I have read quite a few papers on approaches to this (mostly in clock
recovery for IP TV, but also some audio streaming applications) and everyone
seems to have their own favourite method -- with little consensus. I found
people proposing new methods in the literature as late as 2009.
> there is an input pointer that increments each time you get an input
> sample (you might be getting them in asynchronous bursts of samples).
> and there is an output pointer that advances by a given stride that has
> both integer and fractional components. the value of that stride is the
> reciprocal of the output/input sample rate ratio, r. at least it is,
> when it settles down to a reasonably constant value.
Yep. Arrival of input samples in this system have a jitter of ~50ms.
> so imagine this circular buffer with samples popping in with a pointer
> whose increment stride is 1. and an output pointer that increments by
> something that settles down to 1/r. that is the signal that comes out of
> your controller.
> what goes into your controller is the difference between the output
> pointer (the pointer that has an integer and fractional value) and a set
> point for that pointer. the set point is at some fixed delay behind the
> input pointer, which might be at the opposite side of the circular buffer
> (if you want an equal amount of elbow room, but you might not if you want
> low latency).
That's what I'm doing. I feed samples into the structure using the incoming
sample rate i derive at step (1) above.
> do you get time-stamps for the samples going out? does it come from an
> asynchronous interrupt source or can you increment some number for it?
I timestamp a UDP packet when it arrives. The packet contains ~29ms of audio
data @44.1k in the current implementation... but the packet size might
> when the input and output sample rates are *almost* the same, there can
> be a problem of delay slippage of a single sample when the controller
> knows it's off. so you have to compute a signal from the difference of
> timestamp of the sample (or packet) now going out and the computed
> timestamp of effectively where the input pointer is. even thought it
> always increments by 1, you need a fractional signal that represents a
> smoothly incrementing pointer value that happens to cross an integer
> sample boundary every time an input sample comes in. this is estimated
> from the N most recent input samples (or packets). this might be the
> regression problem you mention. i dunno.
Yes, I think that's the regression problem I mention. I need to continuously
compute the relative word-clock phase of the incoming signal.
>> This time I have a mechanism for time offset based on the assumption
>> that the "most on time" packets represent the best time offset, but the
>> rate estimator is still a bit of a mystery... I have a Kalman filter
>> version that works about as well as the OLS rate detector and is a
>> little cheaper -- lately I've been reading up on "robust regression"
>> methods (LMS and TLS) -- they're pretty costly but I'm hoping they'll
>> allow me to lock on to the word clock more quickly.
> i haven't done a Kalman filter problem since grad school. i dunno how to
> do it or use it anymore. i've been comparing this problem to a more
> hardware-like ASRC where, from reading the DSP chip's clock register
> (that increments at the machine instruction rate) when an input sample
> arrives (and putting it into a buffer) and when the output samples goes
> out. that, and the difference of the output pointer and the set point
> (that increments with the input pointer and that fractional portion that
> is computed). there is a simple way to anticipate what the fractional
> portion is from the difference of the most recent two input timestamps
> and the difference of the output timestamp and the most recent input
> timestamp. but i can see that with more of the input timestamps (than
> just two), you can get a better guess.
The difference between what you describe, and the problem I have is that I
have only one timestamp every 29ms, not one timestamp per sample. And my
timestamp can be wrong by up to 50ms. So I need to do the regression (or
similar) accross a larger number of timestamps to try to recover something
remotely approximating a word clock (constant time increment per incoming
> but i haven't tried it because i was thinking that the simple method
> would be good enough if the two asynced sample streams settled down. if
> they settle down, then your increment rate on the output pointer should
> become constant and about 1/r. that's what your controller has to do.
once I've done the wordclock recovery, then i can do that bit. That's my
step 2 above.
thanks.. it's a big help to have the opportunity to try to explain it. and
although I'm still hunting for the idea robust regression/state estimation
method to do the clock recovery with I think I understand what I want the
result to look like a bit better than the (P)(I)(D) controller bit.
More information about the music-dsp