Declipping (Re: sound source separation ( RE: [music-dsp] complicated editors ))

Dave Gamble signalzerodb at
Mon Feb 21 10:09:06 EST 2005

> In an earlier posting, Joshua Scholar wrote:
> A couple of weeks ago I tried to write a decliper that attempted to 
> use an
> all poles filter to bridge the clipped areas.  I started with a
> generalization on linear prediction that allows gaps in the input... 
> but no
> matter what cludgy algorithm I wrote to  move poles around the results 
> were
> bad.  Extrapolating filters are hard!
Hmm. I wonder, what -is- the optimal method for declipping (say) a 
single clipped sample...
[Assume positively clipped]

As I see it, you're looking for a single positive constant, which, when 
added to your clipped sample, will
remove the clipping. So, we have a 1D search space in which to search 
for this value... The question is...
what is the error function?
I imagine one could spend a little time arguing that you might want to 
reinstate the short-term fourier magnitude-spectrum
of the audio containing the clip to be as similar as possible to other 
STFM spectra nearby.
If this -is- the case, then you will have an error function viz:
err = \sum_{n=0}^N ||nearby(n)| - |spiky(n)||
You could rewrite this as a least-squares type of thing:
err = \sum_{n=0}^N (nearby(n) - spiky(n))^2
and then you get the usual:
err = \sum_{n=0}^N (nearby(n)^2 + spiky(n)^2 -2nearby(n)spiky(n) )

You could then work from using an STFT where the spiky sample is the 
first one, and thereby
have it generating a constant in this term. Solving for the minima in 
this case becomes trivial.
Let spiky(n) be the STFT of everything, but with the spiky sample set 
to zero.
err(x) = \sum_{n=0}^N (nearby(n) - spiky(n) + x)^2
And lookit! It rearranges:  t(n)=nearby(n)-spiky(n)
err(x) = \sum_{n=0}^N (t(n)^2 + x^2 - 2xt(n))
differentiate for x:
err'(x) = \sum_{n=0}^N (2x - 2t(n))
Minima when err' is zero, but we can't guarantee that we can get there. 
However, it's a trivial function to optimise. It's a set of quadratics, 
with a 1d searchspace.

Likewise, if we're a little evil, we can generalise out to regenerating 
a few samples, by knowing the transforms for the samples,
or by running through the whole equation.

However, all this stems down from the assumption that we want to try 
and make some STFM matchup. Which has no reason to be true, other than 
"it's a fair starting point, sometimes".

Fundamentally, of course, this problem has no solution, since we're 
dealing with missing data, and there's no way to recover information 
which is lost (implied by clipping).

We might make some argument that there is a harmonic pattern generated 
by the clipped sample, and we could search and destroy that.
We could find some metric for (perhaps) 1/x harmonics present in the 
signal, and then minimise that metric. Again, a 1d searchspace, but 
perhaps a lot more messy this time :)
I guess maybe this is what the neural network based approaches do.

What is going on in the real world with this stuff? Are there any 
general assumptions that people make?

All the best!


More information about the music-dsp mailing list