[music-dsp] Low bitrate audio compression
t.hogers at home.nl
Wed Jan 1 17:20:01 EST 2003
This is probably a dumb idea but...
For voice encoding, would it be possible to detect the "base" frequency and the amplitude + frequency of the first 3 formants and if there is a noise component present.
Assuming the formants do not change too sudden you could do with less than 70 updates of the information per second and interpolate in-between.
The frequency information sits in a limited range and can be quite course, maybe partly be code it as offsets.
Say one byte for the "base" frequency, then one byte each for the frequencies of the formants.
Add an other 3 bytes for the volume of the formants and steal one bit for noise on/off.
That is 7 bytes per frame => @ 70 frames a second that is only 490 byte/second with room left to nibble off more.
For decoding a square/saw + noise oscillator, 3 parallel BP filters and a LP in series would do.
A overall volume level could be derivated from the formant volume data.
The result would probably sound quite "vocoded" (think ELO "mr Blue-sky" for reverence) but understand-able.
----- Original Message -----
From: Ted David
To: music-dsp at aulos.calarts.edu
Sent: Wednesday, January 01, 2003 9:42 AM
Subject: Re: [music-dsp] Low bitrate audio compression
Look at the website http://www.dvsinc.com/. They use a technique Adaptive MultiBand Excitation (AMBE), which is a multiband vector quantization technique that achieves very low rates (e.g., 2.0, 2.4, ..., 4.8kbps -- selectable.) Their technology is patented, and I don't know what the licensing fees are. They sample 64kbps PCM voice and process it in 20ms frames. Initial latency for the first frame is 60ms, but 20ms thereafter. They have coded it for TI and Motorola DSP chips. Their voice quality is good enough that the FAA has selected for use in the new Air Traffic Control digital communications acquisition. Test have shown it to produce intelligible speech with BERs of up to 10%.
Yaakov Stein wrote:
As usual, I only saw this thread after almost everything that can be said, has been.
Broadcom (a major cablemodem/DSL/VoIP chip supplier)
recently announced that they are now using a new compression technology
they call BroadVoice, mainly in order to get aroundIPR issues.
(They claim that they went back to techniques that
were patented inthe past, but all the patents have expired.)
It compresses music to 32Kbps and speech to 16Kbps,
and runs at about 10 MIPS with low delay (8 ms frame size).
I have ony seen one article on it in the open press,
and since I am more interested in the below 4 Kbps range
I haven't looked into it further.
Regarding variable speed playback,
when this is a design goal the best coders are those
based on sinusoidal transform (STC).
You could easily do a 8 Kbps STC,
but I am not sure about the IPR issues.
One technique that is frequently overlooked
is the sub-band coder, which was popular
for a brief period between ADPCM at 32Kbps
(which sounds terrible at 16Kbps)
and the advent of the CELP coders.
It will give you a nice 16Kbps,
but won't help with the variable speed playback
(although it is possibly low enough on MIPS
that you could afford to do a SOLA afterwards).
I do not know of IPR issues here -
this stuff has been around for years (over 17).
Last note - 16-20 Kbps is so easy to attain for voice
that you could probably make something up.
If the variable speed is an issue, try doing something
pitch synchronous, e.g. high order pitch sync LPC
with downsampled or ADPCM'ed residual.
Jonathan (Y) Stein author at dspcsp.com
dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://shoko.calarts.edu/musicdsp/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the music-dsp