[music-dsp] C++ performance

Thomas Strathmann thomas at pdp7.org
Wed Oct 27 13:52:14 EDT 2010

On 10/27/10 17:34 , robert bristow-johnson wrote:
>> 1. Get to know your compiler and library and write "common sense" code
>> that the compiler knows how to deal with in an (near to) optimal way.
> this i agree with.
>> 4. To reiterate: Don't think your cleverer than the people who wrote
>> the compiler and the runtime and don't try to port idioms from
>> other languages in the hope of making your code run faster.
> i don't think i agree with this.
> here is an example (a simple one-pole LPF with noise shaping):
> //
> // transfer function: H(z) = b0/(1 - (a1+1)*z^(-1))
> //
> //
> void myProject::LPF(LPFBlock* this_filter, long* input)
> {
> register long* output_ptr = &(this_filter->output[0]);
> register long b0 = this_filter->b0; // feedforward coefficient Q8.24
> register long a1 = this_filter->a1; // feedback coefficient Q8.24
> register long long y1 = this_filter->y1; // previous output Q16.48,
> roundoff noise state in lower 24 bits
> register long output_sample = (long)(y1>>24); // now is previous output
> sample, y[n-1]
> for (register int i=CHUNK_SIZE; i>0; i--)
> {
> y1 += (long long)b0 * (long long)(*input++); // (y[n-1] + b0*x[n]) * 2^24
> y1 += (long long)a1 * (long long)output_sample; // (y[n-1] + b0*x[n] +
> a1*y[n-1]) * 2^24
> output_sample = (long)(y1>>24); // this truncation has simple roundoff
> noise shaping
> *output_ptr++ = output_sample;
> }
> this_filter->y1 = y1; // save state, including roundoff error state
> }
> the use of register variables, the use of simple arithmetic lines with
> "+=" in the loop is there to make sure the compiler has no doubt what i
> want it to do. the fact that i have "i" count down to zero rather than
> up to CHUNK_SIZE is to prevent the compiler from loading CHUNK_SIZE and
> subtracting, it only needs to compare to zero.

The register qualifier is just a hint. The compiler does not necessarily 
hold the variable in a CPU register. Likewise, the comparison need not 
be implemented using subtraction if your processor has more comparison 
instructions than just a test for zero. gcc 4.2.1 for Intel 32 bit loads 
either 0 (in your case) or CHUNK_SIZE-1 (in the other case) and does the 
appropriate comparison. So i both cases you get a load. Bottom line is: 
You will still have to look at the generated code and/or do profiling. 
No amount of trickery with qualifiers and restructuring your code will 
automatically change anything. But it does not hurt to give the compiler 
some advice. It's always free to ignore it and will often enough do so.

> sometimes you *should* spell it out for the compiler. i only wish the
> promotion to (long long) was unnecessary. i wish that in C and C++, it
> was automatically understood that the type of the result of multiplying
> two N-bit numbers was a type with a 2N-bit number if a 2N-bit type is
> available. that is one flaw of C or C++.

Spelling it out for the compiler is using idioms appropriate for your 
language, compiler, and runtime (point 1). But you can easily overdo it, 
especially if you just think that you know better than the compiler. 
Point 4 does not apply if you actually know how the compiler translates 
certain code sequences. It's dangerous to think that if compiler X for 
language Y in version Z and options O for target T produces such and 
such code then in another setting it must certainly also emit the same code.


More information about the music-dsp mailing list