Declipping (Re: sound source separation ( RE: [music-dsp] complicated
editors ))
Dave Gamble
signalzerodb at yahoo.co.uk
Mon Feb 21 10:09:06 EST 2005
> In an earlier posting, Joshua Scholar wrote:
> A couple of weeks ago I tried to write a decliper that attempted to
> use an
> all poles filter to bridge the clipped areas. I started with a
> generalization on linear prediction that allows gaps in the input...
> but no
> matter what cludgy algorithm I wrote to move poles around the results
> were
> bad. Extrapolating filters are hard!
>
Hmm. I wonder, what -is- the optimal method for declipping (say) a
single clipped sample...
[Assume positively clipped]
As I see it, you're looking for a single positive constant, which, when
added to your clipped sample, will
remove the clipping. So, we have a 1D search space in which to search
for this value... The question is...
what is the error function?
I imagine one could spend a little time arguing that you might want to
reinstate the short-term fourier magnitude-spectrum
of the audio containing the clip to be as similar as possible to other
STFM spectra nearby.
If this -is- the case, then you will have an error function viz:
err = \sum_{n=0}^N ||nearby(n)| - |spiky(n)||
You could rewrite this as a least-squares type of thing:
err = \sum_{n=0}^N (nearby(n) - spiky(n))^2
and then you get the usual:
err = \sum_{n=0}^N (nearby(n)^2 + spiky(n)^2 -2nearby(n)spiky(n) )
You could then work from using an STFT where the spiky sample is the
first one, and thereby
have it generating a constant in this term. Solving for the minima in
this case becomes trivial.
Let spiky(n) be the STFT of everything, but with the spiky sample set
to zero.
err(x) = \sum_{n=0}^N (nearby(n) - spiky(n) + x)^2
And lookit! It rearranges: t(n)=nearby(n)-spiky(n)
err(x) = \sum_{n=0}^N (t(n)^2 + x^2 - 2xt(n))
differentiate for x:
err'(x) = \sum_{n=0}^N (2x - 2t(n))
Minima when err' is zero, but we can't guarantee that we can get there.
However, it's a trivial function to optimise. It's a set of quadratics,
with a 1d searchspace.
Likewise, if we're a little evil, we can generalise out to regenerating
a few samples, by knowing the transforms for the samples,
or by running through the whole equation.
However, all this stems down from the assumption that we want to try
and make some STFM matchup. Which has no reason to be true, other than
"it's a fair starting point, sometimes".
Fundamentally, of course, this problem has no solution, since we're
dealing with missing data, and there's no way to recover information
which is lost (implied by clipping).
We might make some argument that there is a harmonic pattern generated
by the clipped sample, and we could search and destroy that.
We could find some metric for (perhaps) 1/x harmonics present in the
signal, and then minimise that metric. Again, a 1d searchspace, but
perhaps a lot more messy this time :)
I guess maybe this is what the neural network based approaches do.
What is going on in the real world with this stuff? Are there any
general assumptions that people make?
All the best!
Dave.
More information about the music-dsp
mailing list