[music-dsp] a question about reducing calculation-complexity in amodular synthesizer

Michael Gogins gogins at pipeline.com
Mon Dec 12 09:50:42 EST 2005

Thanks for the information, this is very useful.

In my context, since I am not designing a compiler, I need to know what is best to do assuming that unit generators are written in C++ or C and wired together using an existing language such as Lua. That language in turn would be written in C or C++.

It does sound as though laying out unit general member variables in order of use should speed things up -- a bit.


-----Original Message-----
From: James McCartney <asynth at io.com>
Sent: Dec 12, 2005 3:54 AM
To: music-dsp <music-dsp at shoko.calarts.edu>
Subject: Re: [music-dsp] a question about reducing calculation-complexity in	amodular synthesizer

On Dec 8, 2005, at 10:07 AM, Michael Gogins wrote:

> I concur. I have done explicit tests and found block computation  
> roughly 5 times faster.
> The tests implemented a simple FM instrument using all C++ code,  
> sample by sample. I then compared the same instrument coded in  
> Csound sample by sample, and block by block. My C++ was (slightly)  
> faster than Csound sample by sample, 5 x slower than Csound block  
> by block.

I think it is more complicated than this. The following are basically  
the same things I said in a talk several years ago at Dartmouth when  
I was doing code generation.
If you have one unit generator, then there is no difference between  
processing one sample or processing a block -- it is the same thing.  
It is just a loop around a single bit of code. If your synth program  
generates and compiles C code, then you can have a loop around a  
single bit of code that consists of several unit generators. At what  
point does it become slower to loop around one big piece of code than  
to split up that piece of code into several pieces and loop around  
each one? The optimal amount of work to do in one loop cycle is  
larger than your typical unit generator, so block processing of  
single unit generators is not optimal.

With block processing you have the unit generator state in registers  
and the intermediate signals in memory buffers. With single sample  
processing, the intermediate signals are in registers and the state  
variables are in memory. If you arrange your unit generator state  
variables in memory in the order that they will be used then you can  
stream over your state variables in a cache friendly way similarly to  
how you would stream over intermediate signal buffers. If a unit  
generator has more inputs and outputs than it does state variables,  
then single sample processing can be arranged to be faster than block  
processing. Most unit generators have more state than inputs and  
outputs, though.

I think single sample code can be made faster than it is typically  
written. I did some tests a few years ago using machine generated C  
code. Block processing was usually faster, but not by 5 times, and it  
depended on what was being done and how things were divided up.

SuperCollider uses block processing, it has a buffer coloring  
algorithm to minimize space used for intermediate buffers, and it  
arranges all unit generator state variables into a single block of  
memory in the order that they are used. I abandoned the single sample  
approach not because of running speed, but because the turn around  
time for compilation killed interactivity which is crucial when  
designing new sounds.

--- james mccartney

dupswapdrop -- the music-dsp mailing list and website: 
subscription info, FAQ, source code archive, list archive, book reviews, dsp links 

More information about the music-dsp mailing list