[music-dsp] AMDF shortcuts for SOLA time stretching?
James Chandler Jr
jchandjr at bellsouth.net
Tue Apr 22 03:06:03 EDT 2008
I don't have expertise in the AMDF but have spent quite a bit of time coding
time-domain stretching for a couple of years. So here is what I've found for
what it is worth.
The objective originally was beat-map-stretching pre-recorded drum files, with
fast playback efficiency. So did a pre-analysis step, finding the broadband
beats and saving them to a beatmap file associated with each drum file.
The playback sample rate would move 1 sample per sample in the source file, and
when the float playback pointer would reach the next audio beat in the beatmap,
it would crossfade-jump to the new beat location and keep going. Sounded pretty
good over a fairly wide tempo-stretch range.
There was some fiddling with exactly how to 'jump backwards' if slowing the
playback rate, to minimize 'echo' artifacts when repeating sections of audio.
Later refinements for melodic tracks, was to add multiband beatmap detection--
Analyze the file for broadband, bass, mid, and treble beats, then merge-prune
all the beats into a single saved record. IOW, if a treble beat was detected 'in
the vicinity' of a broadband beat, the treble beat would get pruned. So it was
the broadband beatmap supplemented by 'unique' bass, mid, and treble beats which
were new detections missed by the broadband beat detection.
Also added a 'silence' beat type. A 'silence' beat would be inserted in sections
of audio amplitude too quiet to worry about. This helped avoid excessive 'echo'
artifacts when stretching slower, and a drum or music note ends into brief
silence. It would do the cross-fade repeat of audio bounded ahead of the
'silence' beat, avoiding repeated echoes getting splattered into silent sections
where they are very exposed and noticeable.
The array of beats, would have the BeatType and the sample location of the beat.
5 BeatTypes at that level of development.
Those kinds of beats were pruned so the max density of beats was a max of about
1 beat per 40 ms. Too many beats causes too many jump-crossfades, which makes
too much of a phasey sound.
To optimize melodic stretching, the next step was to add two new 'short loop'
beat types, numbered 6 and 7 in the array structure. The short loop Begin and
End beat types would define a good single waveform in the audio. They were
'shoehorned' into the beatmap file structure, so everything would be in the same
Building these big beatmaps including the Good Short Loops is very
time-intensive. If I ever get done with fiddling with the code, will optimize
it, but even with optimization it probably won't become realtime. But the idea
is to quickly real-time stretch pre-recorded material, so the intensive beatmap
detection doesn't really have to run fast, since the user never has to tolerate
the wait. He gets the beatmap files along with the canned audio files.
Sampler programs have traditionally used zero-crossings to look for good short
loops, and there are probably good reasons. I spent awhile fiddling with short
loops on melodic material in Audition, and you can make a good loop from
anywhere in the audio (in stable locations between transients). It doesn't have
to be a zero-cross or a peak, as long as the short-loop length is correct.
Since I was already generating a 5-type beatmap, this was an asset in deciding
where and how to look for short loops with AMDF.
You can pretty well 'bank on' the likelihood that pitch is too unstable to be
rewarding to look for short loops within about 10 or 20 ms after a note or chord
'starts up'-- A 'note-on' or 'strum' or whatever is happening. And it is a waste
of time looking for short loops in silent sections.
So I wrote the short-loop finding to be guided by the beatmap structure already
extracted. It doesn't look for short-loops in silent sections. It will loop thru
the beatmap. On each non-silent audio beat, it looks for good short loops from
about 20 ms AFTER the audio beat, up to about 20 ms BEFORE the next audio beat.
So it only looks for short loops in areas that have good odds of having good
Perhaps if the distance between two beats is 160 ms, and it ignores the first
and last 20 ms, you have 140 ms to look thru. It scans and finds all peaks in
the 140 ms range. If the max autocorr is set for 12 ms, and min autocorr is set
for 4 ms (or whatever is appropriate to the track)-- It builds up a temp array
of all audio peaks in the range, then does autocorr of all peaks against all
other peaks in the range, as long as the time difference between two peaks would
fall within 4 to 12 ms.
I reject low autocorr scores, then filter thru and find the highest remaining
autocorr scores to result in pretty good short-loop begin-end pairs which do not
overlap each other in time.
For such a technique, only cross-correlating between found wave peaks seems to
work as well as cross-correlating zero-crossings or some other feature. Finding
all the peaks in a region, for autocorr anchor points, seemed a bit more
convenient to do.
So the code is very picky about only finding good-confidence short loops. For
instance, a track with full-strum acoustic guitar-- The waveform repeat period
on a full chord is quite long. It has to be long enough to accomodate multiple
different-pitch notes to finish a composite waveforem.
If you mis-identify a short loop on a chord track, it will sound very
un-musical, not like a chord at all. So it is best to miss all but the most
high-confidence short loops to avoid enharmonic awful transient garbage. In
areas where short loops are not well identified, it defaults to the higher-layer
of beatmap stretching, which has a different set of artifacts. But the beatmap
stretch artifacts do not sound as bad as enharmonic hash from a bad short loop
ontop of a full chord.
The end of a possibly tedious message...
More information about the music-dsp