[music-dsp] Effective Waveform Visualisation

robert bristow-johnson rbj at audioimagination.com
Fri Feb 18 17:16:34 EST 2005

on 02/18/2005 14:55, James Chandler Jr at jchandjr at bellsouth.net wrote:
> I haven't kept up with the stereo editor market lately. Maybe most of em do
> that nowadays. But in the past, when zoomed way in, one might see a jagged
> "bar-graph histogram" or equally unrealistic "connect the dots with straight
> lines" when zoomed way in.

i can see (and have thunked a little bit about this in the past) why
interpolation is useful for display when zooming in.  this is one good case
for using windowed-sinc interpolation (Lagrange or Hermite can be sorta
thought of as special cases of this) because it is more important in the
interpolated display of audio that the interpolated function go precisely
through the sample points that define it.

designing an interpolator using Parks-McClellan (MATLAB's "remez") or
least-squares (MATLAB's "firls") may very well be better for the actual
interpolation of audio (for sample-rate conversion, resampling,
pitch-shifting, fractional sample delay) because the low-pass filters might
be better (smaller ripples or more reduced images) than the windowed-sinc
interpolator (even if a good window, like Kaiser, is used).  but then we
accept that, even for a nice integer ratio (say, 2 to 1) upsampler, when the
output sample time lands precisely on an input sample time, the output
sample will not be precisely the same as the input sample.

but for windowed sinc, when the output sample time lands precisely on an
input sample time, the output sample *will* be precisely the same as the
input sample.  that's because the windowed sinc() function will have value
of 1 for the sample that you landed right on top of, but the neighboring
samples (that normally contribute to the interpolation) will have sinc()
values of 0 attached to them.

the same is true for Lagrange and Hermite interpolation and since they are
also linear operators with the input samples, you can construct a sinc-like
interpolation function (a sorta impulse response that you convolute the
zero-padded input with) for them and look at the low-pass spectrum and see
how good of an interpolator it is (from the POV of stop-band attenuation
which is what kills the images and pass-band ripple).  another thing you can
do with that interpolation function is divide it by the raw sinc() function
(you'll have to deal with the 0/0 sigularities by assuming continuity) and
what would be left is the apparent "window".  that's why i way that Lagrange
and Hermite interpolation can be thought of as "windowed sinc"
interpolation.  it's just that window wasn't decided previously and applied
to a sinc() function but, instead, the effective window is determined after
the fact.

so i think that some sorta windowed-sinc or polynomial interpolation is the
best for display.  i would still say that the least confusing display would
be the same as before (except applied to the upsampled audio used only for
waveform visualization).

1.  so you upsample with a windowed-sinc to a factor or 16 or 32 or
whatever.  you need only apply this interpolation to the audio that's
displayed in the window (that *could* be a good chunk so this might not
always be a good idea).

2.  then, given a zoom ratio, for all of the upsampled samples that map to a
single pixel in the display, determine the maximum value and the minimum
value in that small segment of audio.

3.  move your drawing pen to the x-coordinance of the pixel for that segment
of audio and draw a vertical from the y-coordinance of the pixel
corresponding to the min value to the pixel corresponding to the max value.

that is a rule that would be the same whether you've zoomed in to the the
closest view or zoomed out to look at the whole song.

4.  when the user selects a segment of the display where there is more than
1 pixel per (original) audio sample, your display groups all of the pixels
that map to a single sample together and shows all those pixels as
highlighted for each selected sample.  then when the user cuts and pastes
some real audio, the correct number of pixels (of width) are highlighted for
editing an integer number of samples.

IMO, if your zoom ratio was limited to some selection of integer samples per
pixel (1/1, 2/1, 4/1, 8/1, 16/1, etc.) and a few views of sub-integer
samples per pixel (1/2, 1/4, 1/8, 1/16, i can't see why anyone would want to
zoom in closer than that), i would change the above rule to apply this
windowed-sinc interpolation *only* to those few views of sub-integer samples
per pixel.  i'll bet that's what Pro Tools does (not plugging Pro Tools,
just citing a reference).  if your zoom ratio can be anything, then, i am
not sure what the best rule would be.

here is a somewhat different topic:  when one *does* cut and paste, should
the editor do some automatic cross-fading so that there are no clicks or
pops in the splice?  (perhaps some existing editors do this, can anyone give
us a rundown if they know?)  it could even do an automatic cross-correlation
so that some parameter that adjusts for "constant voltage" vs. "constant
power" in the crossfade could be optimally adjusted for that splice.

hmmmm.  maybe i should write a paper about this.  (it's an idea....)


r b-j                  rbj at audioimagination.com

"Imagination is more important than knowledge."

More information about the music-dsp mailing list