UK Vintage Radio Repair and Restoration Powered By Google Custom Search Vintage Radio Service Data

Go Back   UK Vintage Radio Repair and Restoration Discussion Forum > Specific Vintage Equipment > Vintage Computers

Notices

Vintage Computers Any vintage computer systems, calculators, video games etc., but with an emphasis on 1980s and earlier equipment.

Reply
 
Thread Tools
Old 26th May 2019, 9:11 pm   #1
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Fun with 6502 Assembler

I've been doing some 6502 assembly language programming, and I thought I would share a few techniques I found in case they come in useful to anyone else. No originality is claimed on any of the following .....

To perform any arbitrary series of instructions exactly twice and then RTS, you can use something like:
Code:
LDX #0
CLC
JSR sub
.sub
LDA m1,X
ADC m2,X
STA m3,X
INX
RTS
Line 1 sets the X register to 0.
Line 2 clears the carry flag.
Line 3 pushes the return address onto the stack, and jumps to (the very next instruction) line 4.
Line 4 is just a label for the beginning of a subroutine.
Lines 5 et seq are where the magic happens.
Line 8 increases X, so we will be using the next bytes up in memory the next time we go round the loop.
Line 9 is the end of the subroutine. The RTS takes us back to the address we stored at line 3, and the program continues from line 4.
Lines 5 onwards do the magic all over again.
When we get to line 9 the second time around, the RTS returns to wherever we came from.

It adds just three bytes to the program size (for the extra RTS instruction) and twelve cycles (six for the first JSR and another six for the RTS that takes us to the top the second time) to the execution time.


If you are doing bit-packing, you probably need to do several LSR or ASL instructions in a row. The fastest way is to do the bit-shifting right in the accumulator (LSR A and ASL A only take two cycles; as opposed to 5 cycles for zero page, 6 for zero page, X and absolute or 7 cycles for absolute,X -- no other addressing modes are available). If you have something like
Code:
.lsr4
LSR A
.lsr3
LSR A
.lsr2
LSR A
LSR A
RTS
then you can JSR into any of the instructions to shift the accumulator contents to the right 4, 3 or 2 bytes.
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.
julie_m is offline   Reply With Quote
Old 27th May 2019, 9:22 pm   #2
G0HZU_JMR
Octode
 
Join Date: Sep 2010
Location: Cheltenham, Gloucestershire, UK.
Posts: 1,694
Default Re: Fun with 6502 Assembler

Wow that takes me back... I too found out that there are lots of ways to save code space or speed up execution times with these old MCUs.

For example, about 15 years I learned a few tricks with the old 680x series MCUs. A custom version of this MCU series was used in some old N.Denso (Japanese) car ECUs from the 1980s. They only had 4k of ROM so they had to be able to cram as much code (and maps) as possible into this tiny space. I learned more from studying their code than from any textbook!

They did a few neat tricks at machine code level where the code would jump to the middle of a (multibyte) instruction so on certain passes through the routine the code did something else because it jumped into the middle of an instruction. This would appear illegal on a disassembler but it saved a few precious bytes. They also did tricks with lookup tables that allowed efficient addressing and their (unconventional) code could derail a disassembler as they did sneaky tricks with the stack to achieve this. Probably the most useful code I learned was how to efficiently read a large lookup table (eg a 21 x 12 table) and include full interpolation between map points. Over several versions of code (for the same car) they got better and better at doing this with fewer and fewer instructions. I use similar routines for lookup table access in modern AVR MCUs and the code is really fast and efficient.
__________________
Regards, Jeremy G0HZU
G0HZU_JMR is offline   Reply With Quote
Old 28th May 2019, 9:45 am   #3
cmjones01
Octode
 
Join Date: Oct 2008
Location: Warsaw, Poland and Cambridge, UK
Posts: 1,891
Default Re: Fun with 6502 Assembler

Quote:
Originally Posted by julie_m View Post
If you are doing bit-packing, you probably need to do several LSR or ASL instructions in a row. The fastest way is to do the bit-shifting right in the accumulator (LSR A and ASL A only take two cycles; as opposed to 5 cycles for zero page, 6 for zero page, X and absolute or 7 cycles for absolute,X -- no other addressing modes are available). If you have something like
Code:
.lsr4
LSR A
.lsr3
LSR A
.lsr2
LSR A
LSR A
RTS
then you can JSR into any of the instructions to shift the accumulator contents to the right 4, 3 or 2 bytes.
There's a lovely example of loop unrolling (which is the name of this technique) in the BBC Micro's operating system ROM. I always wondered how it managed to clear the screen so quickly, which involves writing a value to anything between 4kbytes and 20kbytes of RAM depending on the graphics mode in use. My attempts at doing it in assembly language always came out slower than the operating system could do it.

It turns out that in the OS ROM there's a section which looks like this:
STA &3000,X
STA &3100,X
STA &3200,X
STA &3300,X
...
(continue with one instruction for each 256 bytes)
...
STA &7C00,X
STA &7D00,X
STA &7E00,X
STA &7F00,X

The largest screen memory (modes 0-2) runs from &3000 to &7FFF, and the smallest (mode 7) from &7C00 to &7FFF. Other modes are somewhere in between. By simplying counting X from 0 to 255 and jumping in to this table of STA instructions, the OS can write a value to any set of locations 256 bytes in size starting at any address from &3000 to &7F00. Because the STAs are unrolled, it runs as fast as the CPU can manage.

Chris
__________________
What's going on in the workshop? http://martin-jones.com/
cmjones01 is offline   Reply With Quote
Old 28th May 2019, 1:09 pm   #4
TonyDuell
Dekatron
 
Join Date: Jun 2015
Location: Biggin Hill, London, UK.
Posts: 3,217
Default Re: Fun with 6502 Assembler

A trick I remember on the 6809 (beautiful processor!) if you wanted to run the same routine with one of two values in the A register.

The 6809 had 'long branch' instructions which were 3 bytes long and allowed you to branch to anywhere in memory (not just within +/-127 bytes of where you are as with normal branches). It also had a complete set of conditional branch (and long branch) instructions, including branch always (BRA and LBRA) and branch never (BRN and LBRN). The former would do the branch (effectively a relative jump) no matter what the flags were, the latter would never branch.

So you started the routine with a LDA instuction to load the A register with one of the 2 values. Then a LBRN with a 16 bit offset that happend to be an LDA for the other of the 2 values. Then continue the routine.

So if you jumped to the first LDA, the accumlator was loaded with the first value. The LBRN did nothing, and then the routine continues. But if you jumped to one byte after the LBRN opcode, the offest was taken as an LDA instruction with the second value, And of course it was followed by the rest of the routine.
TonyDuell is offline   Reply With Quote
Old 28th May 2019, 3:33 pm   #5
Dave Moll
Dekatron
 
Dave Moll's Avatar
 
Join Date: Feb 2005
Location: West Cumbria (CA13), UK
Posts: 4,137
Default Re: Fun with 6502 Assembler

In the 6502 instruction set, conditional branches (various Bxx) are always by a displacement of up to 127 bytes, whereas the unconditional jump (JMP) is to a two-byte address (i.e. to anywhere within the 64KB memory space).

The 6809's BRN and LBRN are presumably used in the same manner as the no-operation (NOP) of the 6502.
__________________
Mending is better than Ending (cf Brave New World by Aldous Huxley)
Dave Moll is offline   Reply With Quote
Old 28th May 2019, 3:45 pm   #6
TonyDuell
Dekatron
 
Join Date: Jun 2015
Location: Biggin Hill, London, UK.
Posts: 3,217
Default Re: Fun with 6502 Assembler

On the 6809 there were unconditional jumps and jump-to-subroutine instructions which took an absolute address.

There were also conditional branches (8 bit displacement so +/- 127 byes) and long branches (16 bit displacement so to anywhere in memory). There were, iIRC, uncondtional branch-to-subroutine instructions (so you could write position-independant code, if the main program and its subroutes were all moved to somewhere else in memory, the displacements needed to get to a subroutine were unchanged). And the conditional branches include 'always' and 'never'.

Yes, the BRN and LBRN were NOPs, but they also skipped one or 2 further bytes (the displacement for the branch or long branch that never occured).
TonyDuell is offline   Reply With Quote
Old 28th May 2019, 3:54 pm   #7
Dave Moll
Dekatron
 
Dave Moll's Avatar
 
Join Date: Feb 2005
Location: West Cumbria (CA13), UK
Posts: 4,137
Default Re: Fun with 6502 Assembler

So:

BRN ≡ NOP NOP
LBRN ≡ NOP NOP NOP
__________________
Mending is better than Ending (cf Brave New World by Aldous Huxley)
Dave Moll is offline   Reply With Quote
Old 28th May 2019, 4:55 pm   #8
TonyDuell
Dekatron
 
Join Date: Jun 2015
Location: Biggin Hill, London, UK.
Posts: 3,217
Default Re: Fun with 6502 Assembler

Not quite.

The point being that 'NOP' as an istruction has a specific binary value.So NOP NOP specifies both bytes uniquely. And NOP NOP NOP specifies all 3 bytes. But for BRN, the first byte has a specific value (to make it a BRN) but the second byte can be anything. And for a LBRN the second and third bytes can be anything.

That's what makes the trick I mentioned work. You have an LBRN with the following bytes (which would be the displacement if the long branch ever happened) the right pair of bytes for an LDA (immediate) instruction and its operand. If you execute them starting with the 'LBRN' byte then they are ignored (the processor takes them as a displacement for a branch which never occurs). But if you jump to the byte after the LBRN (that is to the first byte of the 'displacement') then of course they are used as an LDA instruction.
TonyDuell is offline   Reply With Quote
Old 29th May 2019, 6:10 am   #9
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Re: Fun with 6502 Assembler

On the 6502, there isn't a "Branch Never" instruction (on the early ARM processors, by contrast, every instruction is conditional!), but you can do something like this:
Code:
 
         .entry1
A9 00    LDA #0
CC       EQUB &CC
         .entry2
A9 FF    LDA #255
         \ rest of stuff
60       RTS
Now if we jump to entry1, once we have placed 0 in the accumulator, the next byte EQUB CC followed by LDA #255 actually looks like CPY &FFA9, which will not affect the accumulator; so after 4 cycles, we carry on with A=0. But if we jumped to entry2, we see just the LDA #255 instruction.

This is only one byte shorter than a branch around the "unwanted" instruction, so probably only needed in extreme circumstances.
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.
julie_m is offline   Reply With Quote
Old 31st May 2019, 9:39 pm   #10
NealCrook
Triode
 
Join Date: May 2019
Location: Reading, Berkshire, UK.
Posts: 18
Default Re: Fun with 6502 Assembler

Hi Tony,

>> The 6809 had 'long branch' instructions which were 3 bytes long

The long branch instructions are an 0x10 prefix on the branch instructions: so branch is 1 byte op + 1 byte operand (2 bytes total) and the long branch is 2 bytes op + 2 bytes operand (4 bytes total).

There is a common idiom in the 6809 NitrOS-9 code of using $8C to achieve exactly the effect that you describe, though:

Code:
              ldb   #E$MNF       get error code (module not found)
              fcb   $8C          skip 2 bytes

L070B    ldb   #E$BNam      get error code
8c is "CMPX" so it uses the next 2 bytes as an address, and sets the flags appropriately. It saves 1 byte compared with a bra, at a cost of messing with the flags and doing a "random" memory read.

I had always regarded that as quite a nice trick but describing it now, I think about the 6809's memory-mapped I/O and how you could write code that might end up reading at an address that has read side-effects (eg, a UART) and I shudder...

I am happy to admire 6502 coding wonder, but I cannot contribute any of my own,

Neal.
NealCrook is offline   Reply With Quote
Old 31st May 2019, 10:18 pm   #11
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Re: Fun with 6502 Assembler

You have to pick your "wasted" instruction carefully, so as not to trample on anything important. CPY is a ComPare Y instruction; which sets the carry, subtracts the supplied operand from the value in the Y register and discards the difference, but does set the C (carry), V (overflow, i.e. false change of sign), N (negative, i.e. bit 7) and Z (zero) flags according to the subtraction. The LDA #&FF instruction A9 FF looks like an address &FFA9 to the processor, which therefore will attempt to read it; there may be side-effects if some I/O device is mapped there.

Unlike the 680x family, no 6502 instruction occupies more than three bytes; so you can only mask out a one- or two-byte instruction with this technique on that processor.
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.
julie_m is offline   Reply With Quote
Old 1st Jun 2019, 7:55 pm   #12
Duke_Nukem
Octode
 
Duke_Nukem's Avatar
 
Join Date: Jan 2003
Location: Birmingham, West Midlands, UK.
Posts: 1,168
Default Re: Fun with 6502 Assembler

This thread has made me feel very nostalgic, back to the days of my then shiny new Acorn Atom.

Disassembling the ROM taught me a lot and of course in them days finding subroutines you could make use of would save precious RAM in your own programs.

No printer for me back then, so it was all done by hand ! I've attached an extract, it illustrates how to do division - in those pre-internet pocket-money-wouldn't-cover-a-book days, how else would you learn ? It also illustrates another bit of ROM space saving, note the branch instruction near bottom of page jumps into the middle of an instruction, which happened to be #00 => BRK -> divide by zero error.

I'm sure I have a listing of some games I wrote, will see if I can find it. Back then there seemed to be two main camps, the 6502 brigade vs the Z80 brigade, if I could find my sprite plotting routines it would illustrate why the 6502, whose 3 little 8 bit registers** initially seems puney compared to the Z80's mighty selection of 8/16 bit registers was in fact better due to a better instruction set. There was also the unofficial op codes that did two things at once (some of which were actually useful).

Happy days. Set me up such that when I started work in 1984 as a hardware engineer I could also do the programming too (8051 - or 8039 on real bad days).

TTFN,
Jon

** I guess you could argue zero page RAM were another set of registers ...
Attached Thumbnails
Click image for larger version

Name:	sheets.jpg
Views:	67
Size:	40.0 KB
ID:	184299  
Attached Files
File Type: pdf Disassemble.pdf (606.8 KB, 34 views)
Duke_Nukem is online now   Reply With Quote
Old 8th Jun 2019, 11:44 pm   #13
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Re: Fun with 6502 Assembler

As part of a program which involves converting decimal numbers to binary, I wrote a dedicated subroutine to multiply by ten, in 16-bit unsigned arithmetic. Ten is known as a sparse number, because its binary representation contains only a few ones (just 2 i,n fact: 1010.) This means we can do our multiplication as follows:
  • Make a copy
  • Double the original number
  • Double it again (now we have n * 4)
  • Add the copy (giving n * 5)
  • Double it one last time.
This will be quicker than a "general-purpose" multiplying routine, which would have to check every bit in the multiplier.
Code:
.times10
LDX #0
JSR cpydn
JSR dbldn
JSR dbldn
LDX #0
CLC
JSR add_dn
.dbldn
ASL decnum
ROL decnum+1
RTS
.cpydn
JSR cpydn_1
.cpydn_1
LDA decnum,X
STA dncpy,X
INX
RTS
.add_dn
JSR add_dn1
.add_dn1
LDA decnum,X
ADC dncpy,X
STA decnum,X
INX
RTS
\TEMPORARY WORKSPACES
.decnum EQUW 0
.dncpy EQUW 0
decnum and decnum + 1 are used to store the decimal number which gets multiplied by 10 in situ, and dncpy and dncpy + 1 are used to store a copy of the original value during the multiplication. X gets stomped on, and Z=0 on exit (so a BNE instruction following a JSR here will always branch). The code itself isn't relocatable, as it contains absolute jumps.
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.
julie_m is offline   Reply With Quote
Old 9th Jun 2019, 4:44 pm   #14
Slothie
Pentode
 
Join Date: Apr 2018
Location: Newbury, Berkshire, UK.
Posts: 143
Default Re: Fun with 6502 Assembler

Nice. But I would have done this which doesn't require a temp location for a copy, doesn't change the X or Y regisers, and also is relocatable:
Code:
MUL10:	LDA DECNUM+1	; PUT DECNUM ON STACK
	PHA
	LDA DECNUM
	PHA
	ASL A		; MULTIPLY BY 2
	STA DECNUM
	ROL DECNUM+1
	ASL DECNUM	; THEN BY 2 AGAIN
	ROL DECNUM+1
	PLA		; ADD IN DECNUM SAVED ON STACK
	CLC
	ADC DECNUM
	STA DECNUM
	PLA
	ADC DECNUM+1
	STA DECNUM+1
	ASl DECNUM	; MULTIPLY BY 2 AGAIN
	ROL DECNUM+1
	RTS
Slothie is offline   Reply With Quote
Old 9th Jun 2019, 5:09 pm   #15
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Re: Fun with 6502 Assembler

Oh, yes, that one is nicer than mine! Shorter, too: PHA / PLA is one byte. I could actually get away with omitting the CLC, since I happen to know that my decimal number is never going to exceed 3 digits so C will always be 0 from the preceding ROL.

I'm going to have to go through a whole heap of code now, looking for all the places where I could have used the stack instead of a temporary location .....
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.
julie_m is offline   Reply With Quote
Old 15th Jun 2019, 4:29 pm   #16
ViperSan
Tetrode
 
Join Date: Jun 2019
Location: Manchester, UK
Posts: 78
Default Re: Fun with 6502 Assembler

A bit off topic ..but relevant I guess.
In the days of writing games for the VIC20...memory was indeed at a premium.
..and to squeeze a quart into a pint pot ..often neccessary to improvise.
3K aint a lot.
I can't remember specifics...my 61 year old grey matter is fast losing the plot.
..but I do remember making compact routines with dual or even triple functionality by setting variables which were often called after setting tables to modify code on the fly ..considered naughty but in retrospect neccessary.
So for example a routine to move a psuedo sprite would be the same routine that displayed the score ....or scroll part of the background.
Another trick I occasionally used was to load code directly from tape into the screen ram area ..then hide it by changing colours...and providing this area of screen ram was never accessed in gameplay ..was safe.
tricks of the trade I guess ..
Enjoy your coding
VS
ViperSan is offline   Reply With Quote
Old 16th Jun 2019, 7:54 am   #17
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Re: Fun with 6502 Assembler

I've avoided self-changing code up to now; not so much because I think it's an inherently bad technique (conceptually, it's little different from the eval() function provided by any modern interpretator), but because I wanted the ability to run the same code from ROM or RAM with only address changes.

To make splitting the BASIC Source Code easier, I have a section full of just EQUB / EQUW / EQUD / EQUS statements with labels which I can include in each section, for my variable storage.
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.

Last edited by julie_m; 16th Jun 2019 at 7:59 am.
julie_m is offline   Reply With Quote
Old 21st Jun 2019, 3:21 pm   #18
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Re: Fun with 6502 Assembler

OK, this is something I've found myself having to do.

Part of my code involves selecting one of four rotation angles. Now, I have four separate rotation routines; each one is going to get called multiple times. The actual angle of rotation is bit-packed in a database record, and not fun to retrieve each time the routine is called. So what I am doing is, storing the address of the desired rotation routine in a pair of successive memory locations; then using JMP (indirect) to call the pre-selected one.

That's all well and good; but I also need to know, a little later on, whether the selected rotation is "even" or "odd" in order to draw in another feature which happens to have (at least) order-2 spin symmetry (i.e. it looks the same at 180 as at 0, and the same at 270 as at 90).

Now, I could just store an extra byte when I do the selection. But by positioning my code with as much cunning as a fox wot used to be Professor of Cunning at Oxford University but has since but has moved on, and is now working for the UN at the High Commission of International Cunning Planning, I have managed to arrange for the "even" rotations to begin on an even address, and the "odd" rotations to begin on an odd address. Now I need only examine the LSB of my jump vector;
Code:
.check_rotation
LDA rotv
AND #1
BNE rot_odd
.rot_even
\ instructions
\ ...
RTS
.rot_odd
\ instructions
\ ...
RTS
In BBC BASIC, you can do something like the following to insert an extra NOP if the next instruction would otherwise start on an odd byte:
Code:
]
REM .. force next instruction to start on an EVEN byte..
IF P% AND 1 [ OPT J% : NOP : ]
[ OPT J%
(I'm using J% for my assembler OPTion setting.)

I suppose I could take it right to the next level with even more careful positioning of code, such that the "odd" rotations were at addresses where the highest-order bit of the low byte was set and the "even" rotations were at addresses where this bit was clear. Then I don't even need the AND #1; since we can test bit 7 directly with the BMI or BPL instructions.

I wonder where else this same technique might have been used? There is certainly plenty of 6502 code out there that is based around jump tables, and I can think of several other situations where you might want to know which of two broad groups was selected .....
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.
julie_m is offline   Reply With Quote
Old 21st Jun 2019, 9:27 pm   #19
JohnBHanson
Hexode
 
Join Date: Aug 2009
Location: Worthing, Sussex, UK.
Posts: 397
Default Re: Fun with 6502 Assembler

You can always have modifiable code even if the main code is in rom - just push the opcodes onto the stack and then execute the stack - finally return and get the opcodes off. I have done that on an HS08 which is similar to a 6800 ! and on a 68000!

Anyone want to try on a 6502?
JohnBHanson is offline   Reply With Quote
Old 22nd Jun 2019, 8:38 am   #20
julie_m
Dekatron
 
Join Date: May 2008
Location: Derby, UK.
Posts: 7,306
Default Re: Fun with 6502 Assembler

Unfortunately, on the 6502, the only access to the stack pointer is via the X register; so you can't just push opcodes onto the stack and execute them from there.

The stack starts from 01FF and extends down to 0100. At least some of this space, at the upper end, must be writeable. You could directly write a few instructions to the bottom of the stack and execute them from there, but it would not gain you anything over using some of your workspace in RAM for the same purpose. In any case, the most important instructions LDA, STA, ORA, AND, EOR, ADC, SBC and CMP are available in indirect (zp,X) and (zp),Y addressing modes which obviate the need for self-modifying code.

If you need an indirect mode version of an instruction not available in those modes (the bit shifts, for example, and LDX/LDY/STX/STY), then the only way to do it is to write the instruction directly into RAM and execute it from there. But it does not save any clock cycles over the indirect modes where available.
__________________
If I have seen further than others, it is because I was standing on a pile of failed experiments.
julie_m is offline   Reply With Quote
Reply

Thread Tools



All times are GMT +1. The time now is 9:44 am.


All information and advice on this forum is subject to the WARNING AND DISCLAIMER located at https://www.vintage-radio.net/rules.html.
Failure to heed this warning may result in death or serious injury to yourself and/or others.


Powered by vBulletin®
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Copyright ©2002 - 2019, Paul Stenning.