[music-dsp] Mixing and dynamic range

Sampo Syreeni decoy at iki.fi
Sun Oct 5 10:05:00 EDT 2003

On 2003-10-04, Steve Dekorte uttered:

>Yes, I tried that but the resulting sound is reduced in volume.

Indeed. This is a fundamental fact and there's no way around it. If you
have two arbitrary full-scale signals, you cannot in general represent
their sum without adding an extra bit. You have to scale both sounds down
by half, or the sum can clip. Moreover, the when you add n channels,
you'll have to divide by n or add lb(n) bits, so the more you mix, the
more difficult it all becomes.

In practice we either work in a higher resolution or divide by a number
between 1 and n. IIRC, n statistically independent signals tend to sum to
no more than sqrt(n) RMS, under fairly general conditions, so that's
sometimes used. But especially with heavily processed and synthesized
sounds those conditions do not always hold, and clipping can occur.

IMHO, the real problem with mixing has little to do with the technology,
and a lot to do with people's attitudes. I mean, you can't reliably and
accurately mix two n-bit signals into an n-bit result, so we have to ask,
why would people want to fight this fact? I see two answers.

First, eversince the earliest audio transports we've had to deal with
limited dynamic range. There has always been a tradeoff between dynamic
headroom and noise. That applies to CD's as well -- 16 bits isn't enough
to cover the dynamic range of human hearing. So people have grown to treat
dynamic range as a scarcity. When we mix sounds, we always try to start
with full-range signals, and always aim at a full-range one at identical
bit width. But nowadays that's just stupid. 24 bits *does* cover what we
can hear, and it's cheap enough to go to higher intermediate and final

Really the only factual basis for the full-scale fetish is the fact that
our mic/A/D chains remain noisy. But that should always be compensated for
by varying microphone gain and placing the extra accuracy further down the
sample. Not by making all signals full-scale. People need to understand
that, in today's high resolution DSP environments, it's perfectly
permissible to use an input chain accurate to 14-18 bits to produce 24-bit
signals with lots of dynamic headroom. Those can then be mixed at will,
without attenuation or fear of overflow.

Secondly, relativism isn't too good when it comes to dynamics. Most people
have grown to expect that there's always a volume knob, somewhere, and
that it's alright to touch it. This means that almost all music is
produced in relative amplitude, not absolute. This is bad because it
tempts people to treat volume and dynamic range as arbitrary quantities
which can be varied at will. For instance, when we archive an acoustic
guitar track, we rarely record the absolute amplitude of the take in any
way. Consequently, when we mix it, we often unintentionally end up with an
unrealistic guitar sound -- acoustic instruments respond very differently
depending on how loud they're played. Loudness maximization and the
resulting zero dynamic range are another problem enabled by relative

So what does this have to do with mixing? A lot, because when we get used
to relative amplitude, this contributes to the full-scale fetish. I mean,
if amplitudes are going to be tampered with anyway, why *not* make the
signals full-scale, and make do with fewer bits? Why not even compress
them a bit, because that lessens audible noise, listening to the
full-scale signal...

I advocate a mixing style where we aim at a) enough S/N on the input side
so that amplification of some 6-10dB won't bring in audible noise, b)
enough headroom to accommodate enough of the extra bits produced by a
typical processing chain, c) signal chains which are calibrated in
absolute amplitude as far as possible and d) uniform transport and storage
bit widths wide enough cover the dynamic range of human hearing,
accounting for a) and b). I believe all audio production should first aim
at an absolute, ideal level of reproduction, and only then consider things
like compression for limited range transports, mastering, maximization,
typical user responses, and the like.

When we do it this way, we'll always have enough headroom to mix arbitrary
numbers of signals -- if we don't, we'll already have forced the resulting
SPL's beyond the pain threshold. We'll have significantly fewer noise
problems as well, because we're avoiding them early on. And we also tend
to treat amplification and dynamic processing as first rate effects, which
is what they really are -- an amplified guitar isn't the same thing as a
guitar played louder, and premature compression usually amounts to a
wholesale slaughter of dynamics. To hear the difference, go to a good
movie theatre (cinematic audio works in absolute amplitude), pick a high
profile movie with a composed soundtrack and an all-digital production
team, and enjoy. You'll hear sound that is far more open, less fatiguing
and better balanced than your run-of-the-mill pop album. The same goes for
the very best pop and classical albums as well, as you'll see from the
recommendations of the top mastering engineers and radio technicians, but
there relative amplitude is still a bit of a problem.
Sampo Syreeni, aka decoy - mailto:decoy at iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

More information about the music-dsp mailing list