[music-dsp] who else needs a fractional delay.
rossb-lists at audiomulch.com
Tue Nov 23 21:54:50 EST 2010
robert bristow-johnson wrote:
> On Nov 23, 2010, at 7:35 PM, Ross Bencina wrote:
>> ... there's lots of timing jitter to contend with. Some current
>> research on network time synchronisation involves timestamping IP
>> packets in the driver as soon as they arrive at the interface. I don't
>> have that option.
> but you can timestamp it when you get it, right?
> consider, for the moment, that (assuming the clocks were the same), the
> timestamp you have is a nice constant difference from the hypothetical
> driver timestamp: would the mean sampling rate of the audio be directly
> proportional to the mean packet rate?
The sampling rate of the incoming audio, yes. Directly proportional to the
mean packet rate, over some long period.
> at least after the system settles down after someone turns a knob and
> changes the sample rate ratio (well, that would be determined by the mean
> output sample rate divided by the mean input sample rate)?
Correct. No one turns a knob, there's just crystal tollerances and thermal
> but if, let's say because of a change in the network environment and the
> packet rate on one side or the other had to be decreased, thus changing
> the packet rate ratio which, if the packet sizes remain unchanged,
> changes the sample rate ratio, is your asynchronous sample rate converter
> controlled by clocks that are essentially a smoothed out version of the
> advancing timestamps of both input and output packets (as your system
> assigns the timestamps)?
Yes. That's exactly right. MY ASRC is controlled by clocks that are smoothed
out versions of the advancing timestamps.
Previously you seemed to raise the question about whether the clock
smoothing could be incorporated into the PID loop. I don't believe this is
the best approach due to the nature of the jitter noise in the timestamps..
so I have been working on various smoothing algorithms (today's favourite
amounts to a trimmed mean on the packet inter-arrival times to compute the
incoming packet period).
> sorry for the naive and wordy question. i'm just trying to relate this
> to the hardware ASRC, and i still think it has to become an analogous
It is analogous for sure. It's just that the packet timestamp jitter is such
a big issue that it needs to be dealt with directly. Either using a much
more complex controller than simple PID -- something that uses stochastic
estimation theory (and to be honest, this is pushing it for my level of math
and the available time) or by smoothing the clocks using something that's
more robust than a simple mean before they hit the PID.
Over the *long term* PID would do the job. But I need this thing to
stabilise in seconds.. that's why I need robust estimation rather than
simple long term averaging.
>> The audio is captured from an analog stream in real time, it is
>> associated with other synchronised media (ie film, video etc) so the end
>> to end system latency needs to be below the perceptual A/V
>> synchronisation threshold.
> i might suggest that you bring this to the comp.dsp people. i dunno if
> comp.realtime or comp.embedded or whatever are still active. one thing
> that occurs to me is that there are limits and tradeoffs regarding jitter
> in the data and latency.
It's a good suggestion.. if I get stuck again I may well do that. Right now
I need to deliver a working prototype of all this on Friday so the research
phase is over -- perhaps I should have asked their sooner but I did a pretty
extensive literature review that gave me a lot of ideas to work with.
> it seems to me that with 50 ms of jitter, you would need to have a
> minimum latency of 50 ms. but, there is obviously something that i
> continue to fail to understand about the problem.
You're right. If there is 50ms of jitter you at least need 50ms of buffering
to mask it. I have that.
But if the source and playback sample rates are different (they are, at
least by 1 or 2 Hz at 44.1khz) then we need the playout rate to be corrected
(by a PID controller). The PID will need time to stabilise (assuming we
don't want it to sound like shit) and the PID will need to be pretty heavily
damped to achieve that goal. And during this time that the PID is
stabilising the source and corrected playout rate will not be the same, and
so there is risk of buffer overrun or underrun if I don't add _extra_
buffering beyond the 50ms to mask jitter. -- that was the 5ms number I
mentioned -- extra buffering to avoid underrun/overrun due to corrected-rate
mismatch during PID stabilisation.
More information about the music-dsp