[dorkbotpdx-blabber] another LED array question
Paul Stoffregen
paul at pjrc.com
Tue Dec 16 21:30:07 EST 2008
> ... If we hook them us as three chains of five
> chips each, and arrange things so that not all sub-chains need to get
> updated all the time, it's possible to do things quicker, since you're
> driving shorter chains.
The SPI port is incredibly fast compared to moving bits around in code.
It's very hard to beat SPI's speed with one giant long shift chain, even
if you break the chain into several shorter ones.
If you write this in C using the "normal" way of accessing I/O pins, and
looping code, it will be much, much slower than SPI (where such overhead
tends to happen with the SPI is doing the work shifting the bits).
Even in assembly, single bit I/O writes take 2 cycles for fixed data,
and best case is 4 cycles for variable data (the AVR does not have any
single-bit move instruction, only fixed-value set and clear). If you
want to change a single bit without disturbing the other 7, the fastest
way is SBI and CBI, and those instructions take 2 cycles. If you want
to actually "move a bit", at least one conditional skip or branch is
needed, possibly 2 if you don't already know the bit's value (or didn't
spend 2 cycles setting it to a known value). No matter what you do, if
your code manipulates single pins, it will be a LOT slower. The only
way to change a pin in 1 cycle is to write to all 8 on the same port.
A really aggressive assembly optimization (that could beat SPI's speed)
would be to store the data in a format where the bits are already in the
positions they need to be on the port pins, and a 0 is stored in the bit
that connects to the clock pin. Then you could read the byte from RAM
(2 cycles), write it to all 8 pins on the port (1 cycle), set the clock
bit (1 cycle), and write it to the port again (1 cycle). Reading from
the RAM can auto-increment the pointer, and of course you'll copy this
code many times so there's no looping overhead. That's only 5 cycles!
With 3 shift register chains, that ends up being 1.67 cycles per bit,
which is slightly faster than using SPI (effectively 2.25 cycles per
bit) with all the shift registers in one giant long chain. With 7
chains, you'd actually be putting out 1.4 bits per cycle (3.15 times
faster than the SPI port).... assuming you can arrange to store your
bits in this funny 7 bit format and not factoring in all the CPU time to
accomplish that.
Then again, if you can do crazy assembly optimization, you could
probably figure out some way to get useful work done when you're waiting
for the SPI port to shift the bits.
More information about the dorkbotpdx-blabber
mailing list