[dorkbotpdx-blabber] another LED array question

Paul Stoffregen paul at pjrc.com
Tue Dec 16 21:30:07 EST 2008


> ... If we hook them us as three chains of five
> chips each, and arrange things so that not all sub-chains need to get
> updated all the time, it's possible to do things quicker, since you're
> driving shorter chains.

The SPI port is incredibly fast compared to moving bits around in code.  
It's very hard to beat SPI's speed with one giant long shift chain, even 
if you break the chain into several shorter ones.

If you write this in C using the "normal" way of accessing I/O pins, and 
looping code, it will be much, much slower than SPI (where such overhead 
tends to happen with the SPI is doing the work shifting the bits).

Even in assembly, single bit I/O writes take 2 cycles for fixed data, 
and best case is 4 cycles for variable data (the AVR does not have any 
single-bit move instruction, only fixed-value set and clear).  If you 
want to change a single bit without disturbing the other 7, the fastest 
way is SBI and CBI, and those instructions take 2 cycles.  If you want 
to actually "move a bit", at least one conditional skip or branch is 
needed, possibly 2 if you don't already know the bit's value (or didn't 
spend 2 cycles setting it to a known value).  No matter what you do, if 
your code manipulates single pins, it will be a LOT slower.  The only 
way to change a pin in 1 cycle is to write to all 8 on the same port.

A really aggressive assembly optimization (that could beat SPI's speed) 
would be to store the data in a format where the bits are already in the 
positions they need to be on the port pins, and a 0 is stored in the bit 
that connects to the clock pin.  Then you could read the byte from RAM 
(2 cycles), write it to all 8 pins on the port (1 cycle), set the clock 
bit (1 cycle), and write it to the port again (1 cycle).  Reading from 
the RAM can auto-increment the pointer, and of course you'll copy this 
code many times so there's no looping overhead.  That's only 5 cycles!  
With 3 shift register chains, that ends up being 1.67 cycles per bit, 
which is slightly faster than using SPI (effectively 2.25 cycles per 
bit) with all the shift registers in one giant long chain.  With 7 
chains, you'd actually be putting out 1.4 bits per cycle (3.15 times 
faster than the SPI port).... assuming you can arrange to store your 
bits in this funny 7 bit format and not factoring in all the CPU time to 
accomplish that.

Then again, if you can do crazy assembly optimization, you could 
probably figure out some way to get useful work done when you're waiting 
for the SPI port to shift the bits.






More information about the dorkbotpdx-blabber mailing list