[music-dsp] Wondering if this has been tried yet . . .

Sampo Syreeni decoy at iki.fi
Mon Jun 10 19:43:05 EDT 2002


On Mon, 10 Jun 2002, Keith Handy wrote:

>if you continuously analyze the full audio spectrum of a stereo signal,
>couldn't you break the signal down into all its frequency components and
>figure out where they're all coming from in the stereo field?

Not really. If you think about how harmonic decompositions are defined,
you'll see that any additive components will be able radically to shift
the phase of any constituent component. Most physical sound sources will
have a continuous, anything-but-discrete spectrum. In these conditions,
you cannot use a harmonic decomposition to achieve much spatial
localisation, no matter what.

In highly specialized circumstances (e.g.. purely periodic, infinitely
long-playing sound sources plus irrational proportions among the
periodicities) you're of course able to use this reasoning to construct
teh original sound sources, with directional information. With real,
physically originated signals, I think not. Here, rude approximations or
statistical separations relying on the stationarity of the sound sources
will have to do.

>And in doing so, couldn't you conceiveably identify and isolate the
>"center material" (or material panned to any given location) in a stereo
>mix, and filter out anything that is known to be panned elsewhere (in
>the same way that noise can be filtered out using a noise profile)?

If you can rely on the assumption that centrally panned material will have
no lateral components (like the ones arising from stereo reverb), and that
non-centrally panned components will have no central components (like the
central component of a reverb tail from non-central sources). This is,
however, a very rare occurrence. It *does* work as a simplifying
assumption, but doesn't generally lead to clean results, except in the
case of the most sterile of studio recordings.

>(Or maybe one would use resynthesis instead of filtering, or even some
>"intelligent" combination of both.)

Resynthesis, too, most commonly refers to algorithms derived from linear
filtering and/or summation. There's no huge difference among the ways to
do analysis-resynthesis, because of the fact that high order additive and
subtractive synthesis are both complete systems, able to mimic whatever
sound you might wish.

>I'm thinking a kind of "inverted Karaoke" effect where instead of just
>dumbly cancelling out one location, you actually could isolate a
>position in the field -- and if this were possible, it follows that you
>could break a stereo field up into *several* independent tracks, though
>I realize there are probably a lot of good reasons why this wouldn't
>work spectacularly well.

I think you should look up the theory of directional cancellation in the
context of microwave and/or cellular tranceivers. "phased array" is an
excellent pair of search terms for this kind of thing. Basically, if you
do not know anything about the original signal, even tens of (ingeniously
spaced) receivers will do surprisingly little good. In order to get proper
separation, you will need to have an accurate source model. In microwave
apps, that comes from the fact that the carrier-to-bandwidth ratio is
tiny. In musical ones, you will have to rely on long-term correlations
between the stereo pair. Hence, statistical separation and a whole lot of
processing overhead, even when the sources are highly stationary (which
they rarely are).

>Maybe you would have to analyze phase relationships between the channel,
>in case there's energy in the same band coming from two instruments
>panned opposite each other.  Or maybe it would be a matter of analyzing
>(L-R) and using that profile to filter (L+R).

Basically, L-R / L+R is strictly equivalent to dealing with the 2-D FFT of
the two channels, only rotated to a specific basis. It surely is possible
to save on some processing overhead if you know what is coming in, but in
the general case, the two approaches are equivalent.

However, with typical studio composed sound material, you would expect the
central components to dominate (thanks to intensity panning vs. delay
mixing). Source separation of this kind is much easier to do. Still, you
shouldn't confuse this fact with the trouble we get in the general case.

>I'm thinking this is one of those things that would be heavily flawed,
>in the same way that time stretching is flawed -- but I'd be interested
>to hear what would come out regardless.

Aye. I seem to remember that the Circle Surround encoding crowd claims to
do precisely this. I'm not quite convinced, and I've never heard an actual
demonstration, but it might just be doable.

>I'm also sure I'm not the first to think of it, as I never am, but I
>figure some of you on this list would know more about the logistics of
>practical implication.

Few of us ever come up with anything original. Of course. ;) But the basic
idea is sound, no matter its limitations.

>It's beyond my skill level to actually attempt this, but it's been a fun
>thing to think about.

Mine, too. Nevertheless, I can make a bunch of down-to-earth predictions.
I'd say any system like this will have trouble with:

1) Sound sources with heavy stereophonic reverb tails
2) Sound sources which aren't still, but move around
3) Transients, given that these smear the spectrum quite a lot even with
   physical instruments

My pessimism towards frequency domain source separation schemes naturally
has primarily to do with the third part, given that I'm a sworn fan of
anything with a beat in it. ;)

Sampo Syreeni, aka decoy - mailto:decoy at iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2


dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/




More information about the music-dsp mailing list