[music-dsp] who else needs a fractional delay.

Ross Bencina rossb-lists at audiomulch.com
Tue Nov 23 21:54:50 EST 2010


robert bristow-johnson wrote:
> On Nov 23, 2010, at 7:35 PM, Ross Bencina wrote:
>>
>> ...  there's lots of timing jitter to contend with. Some current 
>> research on network time synchronisation involves timestamping IP 
>> packets in the driver as soon as they arrive at the interface. I  don't 
>> have that option.
>
> but you can timestamp it when you get it, right?

Yep.

> consider, for the  moment, that (assuming the clocks were the same), the 
> timestamp you  have is a nice constant difference from the hypothetical 
> driver  timestamp: would the mean sampling rate of the audio be directly 
> proportional to the mean packet rate?

The sampling rate of the incoming audio, yes. Directly proportional to the 
mean packet rate, over some long period.

> at least after the system  settles down after someone turns a knob and 
> changes the sample rate  ratio (well, that would be determined by the mean 
> output sample rate  divided by the mean input sample rate)?

Correct. No one turns a knob, there's just crystal tollerances and thermal 
drift.

> but if, let's say because of  a change in the network environment and the 
> packet rate  on one side  or the other had to be decreased, thus changing 
> the packet rate ratio  which, if the packet sizes remain unchanged, 
> changes the sample rate  ratio, is your asynchronous sample rate converter 
> controlled by clocks  that are essentially a smoothed out version of the 
> advancing  timestamps of both input and output packets (as your system 
> assigns  the timestamps)?

Yes. That's exactly right. MY ASRC is controlled by clocks that are smoothed 
out versions of the advancing timestamps.

Previously you seemed to raise the question about whether the clock 
smoothing could be incorporated into the PID loop. I don't believe this is 
the best approach due to the nature of the jitter noise in the timestamps.. 
so I have been working on various smoothing algorithms (today's favourite 
amounts to a trimmed mean on the packet inter-arrival times to compute the 
incoming packet period).


> sorry for the naive and wordy question.  i'm just trying to relate  this 
> to the hardware ASRC, and i still think it has to become an  analogous 
> problem.

It is analogous for sure. It's just that the packet timestamp jitter is such 
a big issue that it needs to be dealt with directly. Either using a much 
more complex controller than simple PID -- something that uses stochastic 
estimation theory (and to be honest, this is pushing it for my level of math 
and the available time) or by smoothing the clocks using something that's 
more robust than a simple mean before they hit the PID.

Over the *long term* PID would do the job. But I need this thing to 
stabilise in seconds.. that's why I need robust estimation rather than 
simple long term averaging.


>> The audio is captured from an analog stream in real time, it is 
>> associated with other synchronised media (ie film, video etc) so the  end 
>> to end system latency needs to be below the perceptual A/V 
>> synchronisation threshold.
>
>
> i might suggest that you bring this to the comp.dsp people.  i dunno  if 
> comp.realtime or comp.embedded or whatever are still active.  one  thing 
> that occurs to me is that there are limits and tradeoffs  regarding jitter 
> in the data and latency.

It's a good suggestion.. if I get stuck again I may well do that. Right now 
I need to deliver a working prototype of all this on Friday so the research 
phase is over -- perhaps I should have asked their sooner but I did a pretty 
extensive literature review that gave me a lot of ideas to work with.


> it seems to me that with 50  ms of jitter, you would need to have a 
> minimum latency of 50 ms.  but,  there is obviously something that i 
> continue to fail to understand  about the problem.

You're right. If there is 50ms of jitter you at least need 50ms of buffering 
to mask it. I have that.

But if the source and playback sample rates are different (they are, at 
least by 1 or 2 Hz at 44.1khz) then we need the playout rate to be corrected 
(by a PID controller). The PID will need time to stabilise (assuming we 
don't want it to sound like shit) and the PID will need to be pretty heavily 
damped to achieve that goal. And during this time that the PID is 
stabilising the source and corrected playout rate will not be the same, and 
so there is risk of buffer overrun or underrun if I don't add _extra_ 
buffering beyond the 50ms to mask jitter. -- that was the 5ms number I 
mentioned -- extra buffering to avoid underrun/overrun due to corrected-rate 
mismatch during PID stabilisation.

Ross.









More information about the music-dsp mailing list