[music-dsp] AMDF shortcuts for SOLA time stretching?

James Chandler Jr jchandjr at bellsouth.net
Tue Apr 22 03:06:03 EDT 2008


I don't have expertise in the AMDF but have spent quite a bit of time coding 
time-domain stretching for a couple of years. So here is what I've found for 
what it is worth.

The objective originally was beat-map-stretching pre-recorded drum files, with 
fast playback efficiency. So did a pre-analysis step, finding the broadband 
beats and saving them to a beatmap file associated with each drum file.

The playback sample rate would move 1 sample per sample in the source file, and 
when the float playback pointer would reach the next audio beat in the beatmap, 
it would crossfade-jump to the new beat location and keep going. Sounded pretty 
good over a fairly wide tempo-stretch range.

There was some fiddling with exactly how to 'jump backwards' if slowing the 
playback rate, to minimize 'echo' artifacts when repeating sections of audio.

Later refinements for melodic tracks, was to add multiband beatmap detection--  
Analyze the file for broadband, bass, mid, and treble beats, then merge-prune 
all the beats into a single saved record. IOW, if a treble beat was detected 'in 
the vicinity' of a broadband beat, the treble beat would get pruned. So it was 
the broadband beatmap supplemented by 'unique' bass, mid, and treble beats which 
were new detections missed by the broadband beat detection.

Also added a 'silence' beat type. A 'silence' beat would be inserted in sections 
of audio amplitude too quiet to worry about. This helped avoid excessive 'echo' 
artifacts when stretching slower, and a drum or music note ends into brief 
silence. It would do the cross-fade repeat of audio bounded ahead of the 
'silence' beat, avoiding repeated echoes getting splattered into silent sections 
where they are very exposed and noticeable.

The array of beats, would have the BeatType and the sample location of the beat. 
5 BeatTypes at that level of development.

Those kinds of beats were pruned so the max density of beats was a max of about 
1 beat per 40 ms. Too many beats causes too many jump-crossfades, which makes 
too much of a phasey sound.

To optimize melodic stretching, the next step was to add two new 'short loop' 
beat types, numbered 6 and 7 in the array structure. The short loop Begin and 
End beat types would define a good single waveform in the audio. They were 
'shoehorned' into the beatmap file structure, so everything would be in the same 
array.

Building these big beatmaps including the Good Short Loops is very 
time-intensive. If I ever get done with fiddling with the code, will optimize 
it, but even with optimization it probably won't become realtime. But the idea 
is to quickly real-time stretch pre-recorded material, so the intensive beatmap 
detection doesn't really have to run fast, since the user never has to tolerate 
the wait. He gets the beatmap files along with the canned audio files.

Sampler programs have traditionally used zero-crossings to look for good short 
loops, and there are probably good reasons. I spent awhile fiddling with short 
loops on melodic material in Audition, and you can make a good loop from 
anywhere in the audio (in stable locations between transients). It doesn't have 
to be a zero-cross or a peak, as long as the short-loop length is correct.

Since I was already generating a 5-type beatmap, this was an asset in deciding 
where and how to look for short loops with AMDF.

You can pretty well 'bank on' the likelihood that pitch is too unstable to be 
rewarding to look for short loops within about 10 or 20 ms after a note or chord 
'starts up'-- A 'note-on' or 'strum' or whatever is happening. And it is a waste 
of time looking for short loops in silent sections.

So I wrote the short-loop finding to be guided by the beatmap structure already 
extracted. It doesn't look for short-loops in silent sections. It will loop thru 
the beatmap. On each non-silent audio beat, it looks for good short loops from 
about 20 ms AFTER the audio beat, up to about 20 ms BEFORE the next audio beat. 
So it only looks for short loops in areas that have good odds of having good 
short loops.

Perhaps if the distance between two beats is 160 ms, and it ignores the first 
and last 20 ms, you have 140 ms to look thru. It scans and finds all peaks in 
the 140 ms range. If the max autocorr is set for 12 ms, and min autocorr is set 
for 4 ms (or whatever is appropriate to the track)-- It builds up a temp array 
of all audio peaks in the range, then does autocorr of all peaks against all 
other peaks in the range, as long as the time difference between two peaks would 
fall within 4 to 12 ms.

I reject low autocorr scores, then filter thru and find the highest remaining 
autocorr scores to result in pretty good short-loop begin-end pairs which do not 
overlap each other in time.

For such a technique, only cross-correlating between found wave peaks seems to 
work as well as cross-correlating zero-crossings or some other feature. Finding 
all the peaks in a region, for autocorr anchor points, seemed a bit more 
convenient to do.

So the code is very picky about only finding good-confidence short loops. For 
instance, a track with full-strum acoustic guitar-- The waveform repeat period 
on a full chord is quite long. It has to be long enough to accomodate multiple 
different-pitch notes to finish a composite waveforem.

If you mis-identify a short loop on a chord track, it will sound very 
un-musical, not like a chord at all. So it is best to miss all but the most 
high-confidence short loops to avoid enharmonic awful transient garbage. In 
areas where short loops are not well identified, it defaults to the higher-layer 
of beatmap stretching, which has a different set of artifacts. But the beatmap 
stretch artifacts do not sound as bad as enharmonic hash from a bad short loop 
ontop of a full chord.

The end of a possibly tedious message...

jcjr 



More information about the music-dsp mailing list