[music-dsp] a question about reducing calculation-complexity in
gogins at pipeline.com
Mon Dec 12 09:50:42 EST 2005
Thanks for the information, this is very useful.
In my context, since I am not designing a compiler, I need to know what is best to do assuming that unit generators are written in C++ or C and wired together using an existing language such as Lua. That language in turn would be written in C or C++.
It does sound as though laying out unit general member variables in order of use should speed things up -- a bit.
From: James McCartney <asynth at io.com>
Sent: Dec 12, 2005 3:54 AM
To: music-dsp <music-dsp at shoko.calarts.edu>
Subject: Re: [music-dsp] a question about reducing calculation-complexity in amodular synthesizer
On Dec 8, 2005, at 10:07 AM, Michael Gogins wrote:
> I concur. I have done explicit tests and found block computation
> roughly 5 times faster.
> The tests implemented a simple FM instrument using all C++ code,
> sample by sample. I then compared the same instrument coded in
> Csound sample by sample, and block by block. My C++ was (slightly)
> faster than Csound sample by sample, 5 x slower than Csound block
> by block.
I think it is more complicated than this. The following are basically
the same things I said in a talk several years ago at Dartmouth when
I was doing code generation.
If you have one unit generator, then there is no difference between
processing one sample or processing a block -- it is the same thing.
It is just a loop around a single bit of code. If your synth program
generates and compiles C code, then you can have a loop around a
single bit of code that consists of several unit generators. At what
point does it become slower to loop around one big piece of code than
to split up that piece of code into several pieces and loop around
each one? The optimal amount of work to do in one loop cycle is
larger than your typical unit generator, so block processing of
single unit generators is not optimal.
With block processing you have the unit generator state in registers
and the intermediate signals in memory buffers. With single sample
processing, the intermediate signals are in registers and the state
variables are in memory. If you arrange your unit generator state
variables in memory in the order that they will be used then you can
stream over your state variables in a cache friendly way similarly to
how you would stream over intermediate signal buffers. If a unit
generator has more inputs and outputs than it does state variables,
then single sample processing can be arranged to be faster than block
processing. Most unit generators have more state than inputs and
I think single sample code can be made faster than it is typically
written. I did some tests a few years ago using machine generated C
code. Block processing was usually faster, but not by 5 times, and it
depended on what was being done and how things were divided up.
SuperCollider uses block processing, it has a buffer coloring
algorithm to minimize space used for intermediate buffers, and it
arranges all unit generator state variables into a single block of
memory in the order that they are used. I abandoned the single sample
approach not because of running speed, but because the turn around
time for compilation killed interactivity which is crucial when
designing new sounds.
--- james mccartney
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp links
More information about the music-dsp