View Single Post
Old 28th May 2019, 9:45 am   #3
cmjones01
Nonode
 
Join Date: Oct 2008
Location: Warsaw, Poland and Cambridge, UK
Posts: 2,669
Default Re: Fun with 6502 Assembler

Quote:
Originally Posted by julie_m View Post
If you are doing bit-packing, you probably need to do several LSR or ASL instructions in a row. The fastest way is to do the bit-shifting right in the accumulator (LSR A and ASL A only take two cycles; as opposed to 5 cycles for zero page, 6 for zero page, X and absolute or 7 cycles for absolute,X -- no other addressing modes are available). If you have something like
Code:
.lsr4
LSR A
.lsr3
LSR A
.lsr2
LSR A
LSR A
RTS
then you can JSR into any of the instructions to shift the accumulator contents to the right 4, 3 or 2 bytes.
There's a lovely example of loop unrolling (which is the name of this technique) in the BBC Micro's operating system ROM. I always wondered how it managed to clear the screen so quickly, which involves writing a value to anything between 4kbytes and 20kbytes of RAM depending on the graphics mode in use. My attempts at doing it in assembly language always came out slower than the operating system could do it.

It turns out that in the OS ROM there's a section which looks like this:
STA &3000,X
STA &3100,X
STA &3200,X
STA &3300,X
...
(continue with one instruction for each 256 bytes)
...
STA &7C00,X
STA &7D00,X
STA &7E00,X
STA &7F00,X

The largest screen memory (modes 0-2) runs from &3000 to &7FFF, and the smallest (mode 7) from &7C00 to &7FFF. Other modes are somewhere in between. By simplying counting X from 0 to 255 and jumping in to this table of STA instructions, the OS can write a value to any set of locations 256 bytes in size starting at any address from &3000 to &7F00. Because the STAs are unrolled, it runs as fast as the CPU can manage.

Chris
__________________
What's going on in the workshop? http://martin-jones.com/
cmjones01 is offline