[music-dsp] Algorithms for finding seamless loops in audio

Element Green jgreen at users.sourceforge.net
Mon Nov 29 20:50:15 EST 2010


On Thu, Nov 25, 2010 at 9:33 PM, robert bristow-johnson
<rbj at audioimagination.com> wrote:
>
> depending on how big your "window" is, i think a better term for this is
> *cross-correlation* not autocorrelation.  it's a single stream of audio so
> in a sense of the word, it *is* autocorrelation, but what i normally think
> of, with that semantic is something where the lag is no bigger or not much
> bigger than the analysis window of either loop-end region of the audio and
> the loop-begin.
>
> if the loop points are separated by a much longer time (number of samples)
> than the size (in samples) of the two slices of audio being correlated, it's
> really cross-correlation.  and you might find poor correlation given all
> lags that you're looking at.  in fact, doing cross-correlation from one part
> of the tone or sound to another part that has a rapid change in amplitude
> envelope might fool your correlation into thinking there is a good match
> when there really isn't (because the amplitude is increasing, then the
> cross-correlation increases, but not necessarily because of a good match).
>
> so, instead of either cross or autocorrelation, you might want to consider
> AMDF between the loop end and potential candidates to loop back to.  instead
> of looking for a maximum, you're looking for a minimum and a very low
> minimum means a good match (or a bad match during a very low signal level).

Looking at the equation here for AMDF:
http://mi.eng.cam.ac.uk/~ajr/SpeechAnalysis/node72.html

It seems like the algorithm I came up with independently is very
similar.  The absolute value of the difference of the sample points is
taken as with AMDF.  Prior to summing the values together though, I'm
multiplying by the window I described before (with a peak in the
center where the loop point is), giving samples closer to the loop
point more weight.

In practice this seems to work quite well and I'm going to leave it as
is for now.  It seems reasonably fast and straight forward.

>
> find good loop points, then crossfade.
>
> another thing about cross fading is that there is something you can do to
> adapt a little to better or poor loop points.  if the loop points (and the
> window surrounding them) match well, then you're doing a crossfade between
> coherent audio and a constant voltage crossfade is indicated (when the
> crossfade is half done, both the fade out and fade in envelopes are at 50%).
>  if the loop points are not well matched (but it's the best loop points your
> correlation function can find), then you want to do a crossfade that is
> closer to a constant power crossfade where both fade in and fade out
> envelopes are at 70.7% at the midpoint of the crossfade.  there is a way to
> define the optimal crossfade function for any correlation between 0 (when
> it's like crossfading white noise to white noise) to 100% (like crossfading
> a perfectly periodic waveform to a similarly appearing portion of the
> waveform at loop start).
>
> does any of this make any sense?
>

I'm not sure I'm following you.  From what I can understand it sounds
like you are saying that the degree to which the two loop point signal
windows match could be used to select different cross fade envelope
curves, for a better perceptual cross fade.  I hadn't given this much
thought and just assumed a linear cross fade (0-100%) would be the way
to do it (that is from a limited DSP background mind you).  I am
intrigued by this idea though.  Any tips on how to generate the
envelope functions and what sort of equation could be used for
selecting the optimal envelope based on the signal correlation?

> can i ask what the application is? (i may have missed it, but i'll look at
> earlier posts.)  if it's looping for sound/instrument samples, this is an
> analysis thing that is not real-time and we can consider finding the best
> loop-begin points for a large variety of possible loop-end points.  then
> pick the pair that looks  best, given whatever your measure of good is.  but
> in a (time-domain) real-time pitch shifter, having so many choices may not
> be available to you.  you might find yourself in a situation where your
> loop-end is pretty well defined, you have to find a place to splice to and
> take the best that you can get from that.
>

Its a sample/instrument editor, so its all non-realtime.

> --
>
> r b-j                  rbj at audioimagination.com
>
> "Imagination is more important than knowledge."
>

Thanks for the helpful info!
Element Green


More information about the music-dsp mailing list