UK Vintage Radio Repair and Restoration Discussion Forum

UK Vintage Radio Repair and Restoration Discussion Forum (https://www.vintage-radio.net/forum/index.php)
-   Vintage Computers (https://www.vintage-radio.net/forum/forumdisplay.php?f=16)
-   -   Fun with 6502 Assembler (https://www.vintage-radio.net/forum/showthread.php?t=156829)

julie_m 26th May 2019 9:11 pm

Fun with 6502 Assembler
 
I've been doing some 6502 assembly language programming, and I thought I would share a few techniques I found in case they come in useful to anyone else. No originality is claimed on any of the following .....

To perform any arbitrary series of instructions exactly twice and then RTS, you can use something like:
Code:

LDX #0
CLC
JSR sub
.sub
LDA m1,X
ADC m2,X
STA m3,X
INX
RTS

Line 1 sets the X register to 0.
Line 2 clears the carry flag.
Line 3 pushes the return address onto the stack, and jumps to (the very next instruction) line 4.
Line 4 is just a label for the beginning of a subroutine.
Lines 5 et seq are where the magic happens.
Line 8 increases X, so we will be using the next bytes up in memory the next time we go round the loop.
Line 9 is the end of the subroutine. The RTS takes us back to the address we stored at line 3, and the program continues from line 4.
Lines 5 onwards do the magic all over again.
When we get to line 9 the second time around, the RTS returns to wherever we came from.

It adds just three bytes to the program size (for the extra RTS instruction) and twelve cycles (six for the first JSR and another six for the RTS that takes us to the top the second time) to the execution time.


If you are doing bit-packing, you probably need to do several LSR or ASL instructions in a row. The fastest way is to do the bit-shifting right in the accumulator (LSR A and ASL A only take two cycles; as opposed to 5 cycles for zero page, 6 for zero page, X and absolute or 7 cycles for absolute,X -- no other addressing modes are available). If you have something like
Code:

.lsr4
LSR A
.lsr3
LSR A
.lsr2
LSR A
LSR A
RTS

then you can JSR into any of the instructions to shift the accumulator contents to the right 4, 3 or 2 bytes.

G0HZU_JMR 27th May 2019 9:22 pm

Re: Fun with 6502 Assembler
 
Wow that takes me back... I too found out that there are lots of ways to save code space or speed up execution times with these old MCUs.

For example, about 15 years I learned a few tricks with the old 680x series MCUs. A custom version of this MCU series was used in some old N.Denso (Japanese) car ECUs from the 1980s. They only had 4k of ROM so they had to be able to cram as much code (and maps) as possible into this tiny space. I learned more from studying their code than from any textbook!

They did a few neat tricks at machine code level where the code would jump to the middle of a (multibyte) instruction so on certain passes through the routine the code did something else because it jumped into the middle of an instruction. This would appear illegal on a disassembler but it saved a few precious bytes. They also did tricks with lookup tables that allowed efficient addressing and their (unconventional) code could derail a disassembler as they did sneaky tricks with the stack to achieve this. Probably the most useful code I learned was how to efficiently read a large lookup table (eg a 21 x 12 table) and include full interpolation between map points. Over several versions of code (for the same car) they got better and better at doing this with fewer and fewer instructions. I use similar routines for lookup table access in modern AVR MCUs and the code is really fast and efficient.

cmjones01 28th May 2019 9:45 am

Re: Fun with 6502 Assembler
 
Quote:

Originally Posted by julie_m (Post 1148380)
If you are doing bit-packing, you probably need to do several LSR or ASL instructions in a row. The fastest way is to do the bit-shifting right in the accumulator (LSR A and ASL A only take two cycles; as opposed to 5 cycles for zero page, 6 for zero page, X and absolute or 7 cycles for absolute,X -- no other addressing modes are available). If you have something like
Code:

.lsr4
LSR A
.lsr3
LSR A
.lsr2
LSR A
LSR A
RTS

then you can JSR into any of the instructions to shift the accumulator contents to the right 4, 3 or 2 bytes.

There's a lovely example of loop unrolling (which is the name of this technique) in the BBC Micro's operating system ROM. I always wondered how it managed to clear the screen so quickly, which involves writing a value to anything between 4kbytes and 20kbytes of RAM depending on the graphics mode in use. My attempts at doing it in assembly language always came out slower than the operating system could do it.

It turns out that in the OS ROM there's a section which looks like this:
STA &3000,X
STA &3100,X
STA &3200,X
STA &3300,X
...
(continue with one instruction for each 256 bytes)
...
STA &7C00,X
STA &7D00,X
STA &7E00,X
STA &7F00,X

The largest screen memory (modes 0-2) runs from &3000 to &7FFF, and the smallest (mode 7) from &7C00 to &7FFF. Other modes are somewhere in between. By simplying counting X from 0 to 255 and jumping in to this table of STA instructions, the OS can write a value to any set of locations 256 bytes in size starting at any address from &3000 to &7F00. Because the STAs are unrolled, it runs as fast as the CPU can manage.

Chris

TonyDuell 28th May 2019 1:09 pm

Re: Fun with 6502 Assembler
 
A trick I remember on the 6809 (beautiful processor!) if you wanted to run the same routine with one of two values in the A register.

The 6809 had 'long branch' instructions which were 3 bytes long and allowed you to branch to anywhere in memory (not just within +/-127 bytes of where you are as with normal branches). It also had a complete set of conditional branch (and long branch) instructions, including branch always (BRA and LBRA) and branch never (BRN and LBRN). The former would do the branch (effectively a relative jump) no matter what the flags were, the latter would never branch.

So you started the routine with a LDA instuction to load the A register with one of the 2 values. Then a LBRN with a 16 bit offset that happend to be an LDA for the other of the 2 values. Then continue the routine.

So if you jumped to the first LDA, the accumlator was loaded with the first value. The LBRN did nothing, and then the routine continues. But if you jumped to one byte after the LBRN opcode, the offest was taken as an LDA instruction with the second value, And of course it was followed by the rest of the routine.

Dave Moll 28th May 2019 3:33 pm

Re: Fun with 6502 Assembler
 
In the 6502 instruction set, conditional branches (various Bxx) are always by a displacement of up to ±127 bytes, whereas the unconditional jump (JMP) is to a two-byte address (i.e. to anywhere within the 64KB memory space).

The 6809's BRN and LBRN are presumably used in the same manner as the no-operation (NOP) of the 6502.

TonyDuell 28th May 2019 3:45 pm

Re: Fun with 6502 Assembler
 
On the 6809 there were unconditional jumps and jump-to-subroutine instructions which took an absolute address.

There were also conditional branches (8 bit displacement so +/- 127 byes) and long branches (16 bit displacement so to anywhere in memory). There were, iIRC, uncondtional branch-to-subroutine instructions (so you could write position-independant code, if the main program and its subroutes were all moved to somewhere else in memory, the displacements needed to get to a subroutine were unchanged). And the conditional branches include 'always' and 'never'.

Yes, the BRN and LBRN were NOPs, but they also skipped one or 2 further bytes (the displacement for the branch or long branch that never occured).

Dave Moll 28th May 2019 3:54 pm

Re: Fun with 6502 Assembler
 
So:

BRN ≡ NOP NOP
LBRN ≡ NOP NOP NOP

TonyDuell 28th May 2019 4:55 pm

Re: Fun with 6502 Assembler
 
Not quite.

The point being that 'NOP' as an istruction has a specific binary value.So NOP NOP specifies both bytes uniquely. And NOP NOP NOP specifies all 3 bytes. But for BRN, the first byte has a specific value (to make it a BRN) but the second byte can be anything. And for a LBRN the second and third bytes can be anything.

That's what makes the trick I mentioned work. You have an LBRN with the following bytes (which would be the displacement if the long branch ever happened) the right pair of bytes for an LDA (immediate) instruction and its operand. If you execute them starting with the 'LBRN' byte then they are ignored (the processor takes them as a displacement for a branch which never occurs). But if you jump to the byte after the LBRN (that is to the first byte of the 'displacement') then of course they are used as an LDA instruction.

julie_m 29th May 2019 6:10 am

Re: Fun with 6502 Assembler
 
On the 6502, there isn't a "Branch Never" instruction (on the early ARM processors, by contrast, every instruction is conditional!), but you can do something like this:
Code:


        .entry1
A9 00    LDA #0
CC      EQUB &CC
        .entry2
A9 FF    LDA #255
        \ rest of stuff
60      RTS

Now if we jump to entry1, once we have placed 0 in the accumulator, the next byte EQUB CC followed by LDA #255 actually looks like CPY &FFA9, which will not affect the accumulator; so after 4 cycles, we carry on with A=0. But if we jumped to entry2, we see just the LDA #255 instruction.

This is only one byte shorter than a branch around the "unwanted" instruction, so probably only needed in extreme circumstances.

NealCrook 31st May 2019 9:39 pm

Re: Fun with 6502 Assembler
 
Hi Tony,

>> The 6809 had 'long branch' instructions which were 3 bytes long

The long branch instructions are an 0x10 prefix on the branch instructions: so branch is 1 byte op + 1 byte operand (2 bytes total) and the long branch is 2 bytes op + 2 bytes operand (4 bytes total).

There is a common idiom in the 6809 NitrOS-9 code of using $8C to achieve exactly the effect that you describe, though:

Code:

              ldb  #E$MNF      get error code (module not found)
              fcb  $8C          skip 2 bytes

L070B    ldb  #E$BNam      get error code

8c is "CMPX" so it uses the next 2 bytes as an address, and sets the flags appropriately. It saves 1 byte compared with a bra, at a cost of messing with the flags and doing a "random" memory read.

I had always regarded that as quite a nice trick but describing it now, I think about the 6809's memory-mapped I/O and how you could write code that might end up reading at an address that has read side-effects (eg, a UART) and I shudder...

I am happy to admire 6502 coding wonder, but I cannot contribute any of my own,

Neal.

julie_m 31st May 2019 10:18 pm

Re: Fun with 6502 Assembler
 
You have to pick your "wasted" instruction carefully, so as not to trample on anything important. CPY is a ComPare Y instruction; which sets the carry, subtracts the supplied operand from the value in the Y register and discards the difference, but does set the C (carry), V (overflow, i.e. false change of sign), N (negative, i.e. bit 7) and Z (zero) flags according to the subtraction. The LDA #&FF instruction A9 FF looks like an address &FFA9 to the processor, which therefore will attempt to read it; there may be side-effects if some I/O device is mapped there.

Unlike the 680x family, no 6502 instruction occupies more than three bytes; so you can only mask out a one- or two-byte instruction with this technique on that processor.

Duke_Nukem 1st Jun 2019 7:55 pm

Re: Fun with 6502 Assembler
 
2 Attachment(s)
This thread has made me feel very nostalgic, back to the days of my then shiny new Acorn Atom.

Disassembling the ROM taught me a lot and of course in them days finding subroutines you could make use of would save precious RAM in your own programs.

No printer for me back then, so it was all done by hand ! I've attached an extract, it illustrates how to do division - in those pre-internet pocket-money-wouldn't-cover-a-book days, how else would you learn ? It also illustrates another bit of ROM space saving, note the branch instruction near bottom of page jumps into the middle of an instruction, which happened to be #00 => BRK -> divide by zero error.

I'm sure I have a listing of some games I wrote, will see if I can find it. Back then there seemed to be two main camps, the 6502 brigade vs the Z80 brigade, if I could find my sprite plotting routines it would illustrate why the 6502, whose 3 little 8 bit registers** initially seems puney compared to the Z80's mighty selection of 8/16 bit registers was in fact better due to a better instruction set. There was also the unofficial op codes that did two things at once (some of which were actually useful).

Happy days. Set me up such that when I started work in 1984 as a hardware engineer I could also do the programming too (8051 - or 8039 on real bad days).

TTFN,
Jon

** I guess you could argue zero page RAM were another set of registers ...

julie_m 8th Jun 2019 11:44 pm

Re: Fun with 6502 Assembler
 
As part of a program which involves converting decimal numbers to binary, I wrote a dedicated subroutine to multiply by ten, in 16-bit unsigned arithmetic. Ten is known as a sparse number, because its binary representation contains only a few ones (just 2 i,n fact: 1010.) This means we can do our multiplication as follows:
  • Make a copy
  • Double the original number
  • Double it again (now we have n * 4)
  • Add the copy (giving n * 5)
  • Double it one last time.
This will be quicker than a "general-purpose" multiplying routine, which would have to check every bit in the multiplier.
Code:

.times10
LDX #0
JSR cpydn
JSR dbldn
JSR dbldn
LDX #0
CLC
JSR add_dn
.dbldn
ASL decnum
ROL decnum+1
RTS
.cpydn
JSR cpydn_1
.cpydn_1
LDA decnum,X
STA dncpy,X
INX
RTS
.add_dn
JSR add_dn1
.add_dn1
LDA decnum,X
ADC dncpy,X
STA decnum,X
INX
RTS
\TEMPORARY WORKSPACES
.decnum EQUW 0
.dncpy EQUW 0

decnum and decnum + 1 are used to store the decimal number which gets multiplied by 10 in situ, and dncpy and dncpy + 1 are used to store a copy of the original value during the multiplication. X gets stomped on, and Z=0 on exit (so a BNE instruction following a JSR here will always branch). The code itself isn't relocatable, as it contains absolute jumps.

Slothie 9th Jun 2019 4:44 pm

Re: Fun with 6502 Assembler
 
Nice. But I would have done this which doesn't require a temp location for a copy, doesn't change the X or Y regisers, and also is relocatable:
Code:

MUL10:        LDA DECNUM+1        ; PUT DECNUM ON STACK
        PHA
        LDA DECNUM
        PHA
        ASL A                ; MULTIPLY BY 2
        STA DECNUM
        ROL DECNUM+1
        ASL DECNUM        ; THEN BY 2 AGAIN
        ROL DECNUM+1
        PLA                ; ADD IN DECNUM SAVED ON STACK
        CLC
        ADC DECNUM
        STA DECNUM
        PLA
        ADC DECNUM+1
        STA DECNUM+1
        ASl DECNUM        ; MULTIPLY BY 2 AGAIN
        ROL DECNUM+1
        RTS


julie_m 9th Jun 2019 5:09 pm

Re: Fun with 6502 Assembler
 
Oh, yes, that one is nicer than mine! Shorter, too: PHA / PLA is one byte. I could actually get away with omitting the CLC, since I happen to know that my decimal number is never going to exceed 3 digits so C will always be 0 from the preceding ROL.

I'm going to have to go through a whole heap of code now, looking for all the places where I could have used the stack instead of a temporary location .....

ViperSan 15th Jun 2019 4:29 pm

Re: Fun with 6502 Assembler
 
A bit off topic ..but relevant I guess.
In the days of writing games for the VIC20...memory was indeed at a premium.
..and to squeeze a quart into a pint pot ..often neccessary to improvise.
3K aint a lot.
I can't remember specifics...my 61 year old grey matter is fast losing the plot.
..but I do remember making compact routines with dual or even triple functionality by setting variables which were often called after setting tables to modify code on the fly ..considered naughty but in retrospect neccessary.
So for example a routine to move a psuedo sprite would be the same routine that displayed the score ....or scroll part of the background.
Another trick I occasionally used was to load code directly from tape into the screen ram area ..then hide it by changing colours...and providing this area of screen ram was never accessed in gameplay ..was safe.
tricks of the trade I guess ..
Enjoy your coding
VS

julie_m 16th Jun 2019 7:54 am

Re: Fun with 6502 Assembler
 
I've avoided self-changing code up to now; not so much because I think it's an inherently bad technique (conceptually, it's little different from the eval() function provided by any modern interpretator), but because I wanted the ability to run the same code from ROM or RAM with only address changes.

To make splitting the BASIC Source Code easier, I have a section full of just EQUB / EQUW / EQUD / EQUS statements with labels which I can include in each section, for my variable storage.

julie_m 21st Jun 2019 3:21 pm

Re: Fun with 6502 Assembler
 
OK, this is something I've found myself having to do.

Part of my code involves selecting one of four rotation angles. Now, I have four separate rotation routines; each one is going to get called multiple times. The actual angle of rotation is bit-packed in a database record, and not fun to retrieve each time the routine is called. So what I am doing is, storing the address of the desired rotation routine in a pair of successive memory locations; then using JMP (indirect) to call the pre-selected one.

That's all well and good; but I also need to know, a little later on, whether the selected rotation is "even" or "odd" in order to draw in another feature which happens to have (at least) order-2 spin symmetry (i.e. it looks the same at 180° as at 0°, and the same at 270° as at 90°).

Now, I could just store an extra byte when I do the selection. But by positioning my code with as much cunning as a fox wot used to be Professor of Cunning at Oxford University but has since but has moved on, and is now working for the UN at the High Commission of International Cunning Planning, I have managed to arrange for the "even" rotations to begin on an even address, and the "odd" rotations to begin on an odd address. Now I need only examine the LSB of my jump vector;
Code:

.check_rotation
LDA rotv
AND #1
BNE rot_odd
.rot_even
\ instructions
\ ...
RTS
.rot_odd
\ instructions
\ ...
RTS

In BBC BASIC, you can do something like the following to insert an extra NOP if the next instruction would otherwise start on an odd byte:
Code:

]
REM .. force next instruction to start on an EVEN byte..
IF P% AND 1 [ OPT J% : NOP : ]
[ OPT J%

(I'm using J% for my assembler OPTion setting.)

I suppose I could take it right to the next level with even more careful positioning of code, such that the "odd" rotations were at addresses where the highest-order bit of the low byte was set and the "even" rotations were at addresses where this bit was clear. Then I don't even need the AND #1; since we can test bit 7 directly with the BMI or BPL instructions.

I wonder where else this same technique might have been used? There is certainly plenty of 6502 code out there that is based around jump tables, and I can think of several other situations where you might want to know which of two broad groups was selected .....

JohnBHanson 21st Jun 2019 9:27 pm

Re: Fun with 6502 Assembler
 
You can always have modifiable code even if the main code is in rom - just push the opcodes onto the stack and then execute the stack - finally return and get the opcodes off. I have done that on an HS08 which is similar to a 6800 ! and on a 68000!

Anyone want to try on a 6502?

julie_m 22nd Jun 2019 8:38 am

Re: Fun with 6502 Assembler
 
Unfortunately, on the 6502, the only access to the stack pointer is via the X register; so you can't just push opcodes onto the stack and execute them from there.

The stack starts from 01FF and extends down to 0100. At least some of this space, at the upper end, must be writeable. You could directly write a few instructions to the bottom of the stack and execute them from there, but it would not gain you anything over using some of your workspace in RAM for the same purpose. In any case, the most important instructions LDA, STA, ORA, AND, EOR, ADC, SBC and CMP are available in indirect (zp,X) and (zp),Y addressing modes which obviate the need for self-modifying code.

If you need an indirect mode version of an instruction not available in those modes (the bit shifts, for example, and LDX/LDY/STX/STY), then the only way to do it is to write the instruction directly into RAM and execute it from there. But it does not save any clock cycles over the indirect modes where available.


All times are GMT +1. The time now is 11:50 pm.

Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright ©2002 - 2023, Paul Stenning.