Floating point processing

Chris Townsend ctownsend at arboretum.com
Wed Jul 15 15:34:13 EDT 1998

```I have some questions regarding the Intel Architecutre FPU, and how it
compares to other FPUs like the PowerPC's.  I have read that the IA FPU
uses purely 80 bit registers for floating point calculations.  This means
that prior to performing any floating point operations single and double
precision data will be converted to 80 bits (i.e. long double), and then
back to single or double precision when the operation is finished.  Ideally
all intermediate data will stay at 80bits of precision, but it seems
inevitable that some intermediate results need to be stored back in memory
and probably at single or double precision.

Assuming I understand this correctly, how can one code C++ so that accuracy
is maximized by maintaining the most precision possible.  For example, it
seems in a case like this 80 bits of precision might be maintained up until
the final assignment to out.
float out, in1, in2, in3, in4;
out = in1 * in2 * in3 * in4;

But in a case like this I'd suspect that truncation back to single
precision occurs after each operation.
out = in1;
out *= in2;
out *= in3;
out *= in4;

Does anyone know what IA based compilers (e.g. MSVC) actually do in these
situations, since I'm purely speculating what is happening?

Another thing I found is that converting DSP code from single to double
precision more than doubled the CPU utilization (at least on a Pentium II).
This seems somewhat strange to me, since all operations are performed at
80 bits.  Of course double precision will require twice as much data to be
moved around in memory, but I'm not sure that this could completely account
for the large change in CPU utilization.  Any thoughts?

Also, 80 bit register usage seems to imply that there is no additional
overhead between operations that combine float and double precision data
types since everything must be converted to long double anyway.  Does
anyone know if this is actually the case?  If there is little or no
overhead in float to double conversions, one could efficiently code a
filter with a double precision accumulator and single precision data and
coefficients.  If on the other hand float to double conversions are slow
then it could be more efficient to make the entire filter double precision.

I'm also wondering how this all compares to other FPUs (e.g. PowerPC).

Thanks,
Chris

--------------------------------
Chris Townsend - DSP Engineer
Arboretum Systems, Inc.
http://www.arboretum.com
--------------------------------

```