|
Vintage Computers Any vintage computer systems, calculators, video games etc., but with an emphasis on 1980s and earlier equipment. |
|
Thread Tools |
25th Mar 2023, 3:07 pm | #1 |
Tetrode
Join Date: May 2021
Location: Titz, Germany.
Posts: 72
|
Stack and subroutine calls
It is frequently said that the SC/MP does not have a stack and Elbug has a software stack with a very high overhead for that reason. But looking at the architecture, the SC/MP and its XPPC instruction always reminded me of the branch and link (BAL) instruction of the IBM/360. Digging through the SC/MP programming and assembler manual, section 6.2 "Stack programming" confirms that: There is an example how to call subroutines with statically allocated call frames. Their advantage is that you have no stack that could overflow, but they consume more RAM than stack allocated frames and you need to save/restore the frame pointer.
If you do not extend call frames dynamically, and the SC/MP only allows signed byte indexed addressing anyway, there should be an easier way using call frames on the stack: Subroutine entry p3 contains the return address p2 points to frame: subroutine arguments addressed with positive offsets if this subroutine contains calls, save p3 to 0/-1(p2) locals are stored further down caller arguments are stored below locals Returning from subroutine ld -1(p2) ; Restore p3 if this subroutine made calls xpah p3 ld 0(p2) xpal p3 xppc p3 ; Return Call a subroutine store callee arguments below locals ld @-argsoffset(p2) ; decrease p2 to point below arguments ldi l(subroutine) xpal p3 ldi h(subroutine) xpah p3 xppc p3 ld @argsoffset(p2) ; restore caller p2 I did not try this, but it looks like it should work. As long as the call frame stack does not exceed a page, a single call frame would not have more than 127 bytes passed arguments and 128 bytes locals and call arguments, it is suitable for independently assembled and linked modules or even a compiler. Which makes me wonder: Were there ever any compilers for SC/MP? Michael |
25th Mar 2023, 5:23 pm | #2 |
Octode
Join Date: Mar 2020
Location: Kitchener, Ontario, Canada
Posts: 1,265
|
Re: Stack and subroutine calls
Why not just use auto index addressing mode to push and pop values from the stack?
Xpah p3. ; push return address (little endian) St @-1(p2) Xpal p3 St @-1(p2) Ld @1(p2). ; pop return address (little endian) Xpal p3 Ld @1(p2) Xpah p3 If this is going to be used often enough then it may be worth having a subroutine/interupt service routine. Keep P3 pointed at the service routine, then:- Xppc p3 Db calleelow Db calleehi Service: St @-1(p2) Jump to interupt service if senseA set, possibly after saving p3 on stack Xpah p3 St @-1(p2) Xpah p3 Xpal p3 St @-1(p2) Xpal p3 Ld @1(p3) Xae Ld @1(p3) Xpah p3 Xae Xpal p3 Xppc p3 Jmp service Maybe use the carry flag to indicate call or return. The service routine then handles calls and returns and also interupt service. This is just a first attempt, could possibly also allocate local storage on the stack and preserve Acc and E in the call to and return from the subroutine. |
25th Mar 2023, 8:11 pm | #3 |
Octode
Join Date: Mar 2011
Location: North Yorkshire, UK.
Posts: 1,084
|
Re: Stack and subroutine calls
I always thought that the "it doesnt have a stack" comments came from non-users.
I use Nat Semis method using auto-index via P2, its a byte stack rather than word like other processors, but its easy to push & pop byte values onto/off the stack. Kitbug for example makes extensive use of the stack. The 'lost book' from a few threads back has an interesting subroutine management snippet called "The long arm of P3" |
25th Mar 2023, 9:37 pm | #4 |
Dekatron
Join Date: Aug 2011
Location: Newcastle, Tyne and Wear, UK.
Posts: 11,484
|
Re: Stack and subroutine calls
I guess I'm one of the guilty ones who peddle the view that the SC/MP doesn't have a stack, but I think I'm right in the sense that it doesn't have a dedicated stack pointer and it doesn't have the 'traditional' CALL and RET type instructions to go with that.
|
26th Mar 2023, 1:08 am | #5 |
Octode
Join Date: Mar 2011
Location: North Yorkshire, UK.
Posts: 1,084
|
Re: Stack and subroutine calls
Lets compromise & go with "it can perform some stack operations"
|
26th Mar 2023, 1:31 am | #6 | |
Octode
Join Date: May 2018
Location: Northampton, Northamptonshire, UK.
Posts: 1,394
|
Re: Stack and subroutine calls
Quote:
The original PIC16xx uC's also had a very limited h/w stack (size/depth-wise), - maybe due to its early General Instruments heritage (originally, as a mask-programmed only uP) - and might also have made compilers more difficult. It was often said that lack of (particularly full-word / uniform instruction operations) registers, made writing compilers that produced efficient-code more difficult. So later processor architectures, like 68000, had many (Rn) registers that could be used with all instructions, for better compiler support. With limited memory resources back then, Compilers tended to be quite rare and everything done directly in optimised by-hand assembler. And early ones for home computers, were mainly to speed-up BASIC, by removing the interpreting in real-time overhead. C compilers did eventually appear for all 8bit uP's, and maybe C might be regarded as a bit lower-level (especially compared to C++) than BASIC / closer to assembler. The SC/MP 4k page size, and not being able to have code automatically flowing crossing 4K boundaries, might also have been a bit problematic, with need for workarounds by hand. |
|
31st Mar 2023, 7:59 am | #7 |
Tetrode
Join Date: May 2021
Location: Titz, Germany.
Posts: 72
|
Re: Stack and subroutine calls
Mark1960: You can certainly use a pointer register as stack pointer, just like on the IBM/360, and nobody would say that architecture had no stack just because it did not have CALL/RET but instead swapped a register with the PC. Static call frames, not stack allocated ones, were common back then. The disadvantage of using a register as stack pointer is that the subroutine has to unfold its data flow to match a stack, as if you were using Forth, and even Forth has SWAP, DUP, OVER... which is all working around the problem that a stack offers very limited addressing. If you use the same pointer register as frame pointer instead, you can address data with a signed byte offset in a very efficient way and do not need to unfold your data flow.
Forth code that is well written is very efficient, because the addressing overhead of a stack is minimal and a data flow that can make use of that is extremely efficient. But it makes code of even a few lines an optimization puzzle. Forth has two stacks, because you would want to extend that optimization over the whole program and not create and destroy stack frames, which is again overhead. I tried it and it changed my view of C: C is a horribly inefficient language and that is fine with me. The pages are indeed in the way of a compiler. You certainly could have a linker that inserts jumps to cross pages, and that encodes jumps as short or long. Variable instruction length encoding is essential for transputers. It makes the linker slow, but it works and results in good code. The two main troubles for a compiler would be that pointer arithmetic does not cross pages and that pointers are so slow to load and store. The first probably means malloc() had to be limited to at most a page per object, which is a serious restriction. Further the stack is limited to a single page, which is probably ok. The speed I don't have a solution for. |
31st Mar 2023, 10:55 am | #8 |
Octode
Join Date: Jan 2003
Location: Ware, Herts. UK.
Posts: 1,082
|
Re: Stack and subroutine calls
Modern ARM processors don't quite have CALL and RET equivalent instructions. instead the BL instruction loads the return address into the LR register (R14) and the B LR instruction loads the program counter from LR. LR can be saved and restored from the stack when nested subroutines are used.
John |
31st Mar 2023, 4:06 pm | #9 |
Tetrode
Join Date: May 2021
Location: Titz, Germany.
Posts: 72
|
Re: Stack and subroutine calls
It's funny they named it B LR, because the register addressed version on IBM/360 is BALR and by convention the return address is also stored in R14:
https://faculty.cs.niu.edu/~hutchins...40/more-br.htm It actually makes sense, because if you do not have to spill the register to memory, you can save two memory accesses, unlike CALL/RET. I don't think that's why they did that for the SC/MP, though, and guess they just followed a contemporary pattern for static call frame linkage. |
31st Mar 2023, 6:31 pm | #10 |
Octode
Join Date: Mar 2020
Location: Kitchener, Ontario, Canada
Posts: 1,265
|
Re: Stack and subroutine calls
There are a couple of things that might be a problem if the local data of the subroutine is below the pointer.
Interupts would not be able to use that pointer, as the interupt service routine would not know the size of the locals area. Each call from the subroutine would need to move the position of the frame pointer as the routine called would not know the size of the calling routines locals area. Its not just ARM that uses the branch and link method for subroutine calls. This is standard for all RISC type ISAs as it works better with pipelined processors. The disadvantage for the 8060 is that it only has four registers. One is the PC, P3 is used for branch and link, but also points to the interupt service. If P2 is used for stack or frame pointer then only one remains for general purpose. Copying or sorting data often needs two pointers, sometimes more, so there is quite a lot of swapping the pointer to ram, but then that has to go through the accumulator. P3 can be used if interupts are disabled, so long as the impact on interupt latency is not a problem. For larger programs it might be better to implement an emulator for a better ISA, CP/M on an 8060 would be interesting |
1st Apr 2023, 1:12 pm | #11 |
Tetrode
Join Date: May 2021
Location: Titz, Germany.
Posts: 72
|
Re: Stack and subroutine calls
Good point that having the frame pointer on top avoids increasing it for each call, but then the subroutine had to increase it on entry and return, so what's saved for the caller creates a cost in the callee. Which of course means you could do it either way.
Interrupts are weird on the SC/MP. Basically you must not use P3, because it holds the interrupt vector, which is an incredible cost for this register starved design. To me that sounds like a decision between rather high level code with call frames or very low level optimized code that keeps P3 reserved and allows interrupts. It might be possible to introduce interrupt scheduling points where you load P3, enable interrupts and disable them right away, which increases the latency, but you could do that at a time where A, E and P1 could be destroyed, which makes the handler faster. A bytecode interpreter substitutes memory usage against speed. NIBL shows that: Small, but slow. It is best to implement a virtual machine, either a stack machine, or a register machine, or something in between like AcheronVM for the 6502, instead of an actual existing CPU with all of its details. From my experience, addressing quickly becomes the bottleneck in VMs. AcheronVM is brilliant there. Typically you seek to make use of whatever the architecture can do very fast, like the zero page access for the 6502. Is there anything the SC/MP can do fast? I never thought about branch and link player better with a pipelined architecture. It is interesting that Transputers did not use that, but I guess the thread scheduling points were seriously in the way of a register holding the return address. I never programmed any other RISC architecture in assembler, only CISC. |
1st Apr 2023, 5:00 pm | #12 | |
Octode
Join Date: May 2018
Location: Northampton, Northamptonshire, UK.
Posts: 1,394
|
Re: Stack and subroutine calls
Quote:
And the 6502, with relatively-simple reduced instruction set / low < 3500 transistors count design by a small team, was very-much the inspiration for Acorn's original ARM-designers - after being disappointed by the performance of National Semi's 32016 processor (that was so-complex it had 100 people working on it, and still had MMU issues), plus the 68000 (that Sophie Wilson illustrated in a talk was slower than the 6502 at the same clock speed for a simple 8bit addition), due to all the micro-coding required to implement these. (Although the 68000's compiler-friendlier abundance of registers, and universal Move instruction between any, did also feature in the ARM, to overcome 6502's lack of these). - The micro-coding on the original 8051, meant even a NOP took 12 clock cycles! And even the Philips etc enhanced version took 6 cycles, until SiLabs boosted the 8051's popularity a lot with their large-range of small low-power high-speed single-cycle 8051-core micro-controllers. Acorn's Dr Steve Furber described had they ran typical programs by hand using cards, through their ARM architecture to design-out any bottlenecks in it. And he'd also done an 800-line BBC BASIC simulation of the original ARM, that he reckoned ARM still wouldn't let him release to public domain. When the ARM was first launched, it could out-perform most other microprocessor, being up there with the much more expensive highest-end PC processors, probably overtaking many of these these when DEC used their Alpha-processor technology on the StrongARM, to boost clock speeds to hundreds of MHz, when original lack of Floating-point Co-Pro had started to hold it back in some PC applications. Regarding the SC/MP's architecture, I found some interesting discussions / links, here: https://groups.google.com/g/comp.arch/c/uE8CDTtNhwM - Probably moving-on, from: https://en.wikipedia.org/wiki/Nation...onductor_SC/MP https://www.cl.cam.ac.uk/teaching/20...tory.html#SCMP -Where it seems NS's slightly-more successful COP series were successors to SC/MP, but with added stack support. I wonder how many transistors the SC/MP had, compare to the <3,500 in the 6502, to save complexity and have some parts of it serial. (BTW, I only recently discovered about some early attempts at '1 bit' Serial computer designs: https://en.wikipedia.org/wiki/Motorola_MC14500B - Although the 4bit 16 instructions were parallel-fed into this). |
|
1st Apr 2023, 7:15 pm | #13 |
Octode
Join Date: Mar 2011
Location: North Yorkshire, UK.
Posts: 1,084
|
Re: Stack and subroutine calls
My own sc/mp wishlist would have long jumps before a stack pointer!
Last edited by Phil__G; 1st Apr 2023 at 7:21 pm. |
1st Apr 2023, 7:25 pm | #14 |
Octode
Join Date: Mar 2020
Location: Kitchener, Ontario, Canada
Posts: 1,265
|
Re: Stack and subroutine calls
One advantage of the sc/mp is the auto increment and decrement of the pointers. This could be used in subroutines to both initialise a local variable and allocate space on the stack.
The 4k page limit for a stack is not as limiting as the 256 byte stack on 6502, though zero page pointers can be used on 6502 to implement a separate data stack, making 6502 good for forth. |
1st Apr 2023, 8:48 pm | #15 |
Octode
Join Date: Mar 2011
Location: North Yorkshire, UK.
Posts: 1,084
|
Re: Stack and subroutine calls
That comp.arch thread has so many errors!
|
3rd Apr 2023, 9:34 pm | #16 |
Triode
Join Date: Apr 2023
Location: Sydney New South Wales, Australia.
Posts: 31
|
Re: Stack and subroutine calls
Hi,
You can use the "default" (ie as Nat Semi proposed) P2 as your stack pointer, but instead of continually using it with +/- option, just add your hard-coded index for most uses. This is good for saving variables and registers etc without having to change P2. Within certain small routines, such as (for example) serial input and output, you can use the +/- capability, but just make sure you return P2 back to its original value when you exit the routine. While this is a bit of a kludge, it does allow you to save all registers, except P2 in a system where the default end-of-first-page-memory is not RAM on your particular system. However, as you know what your original P2 is, you can save all registers and bung in your P2, and thus can do some debugging if required. As mentioned, not the neatest solution, but can work if your system has a different memory map to the default Kitbug sort of expectation. river |
5th Apr 2023, 3:28 am | #17 |
Banned
Join Date: Nov 2014
Location: Derry, Northern Ireland, UK.
Posts: 167
|
Re: Stack and subroutine calls
Use C
it's non object. |
5th Apr 2023, 4:08 am | #18 |
Banned
Join Date: Nov 2014
Location: Derry, Northern Ireland, UK.
Posts: 167
|
Re: Stack and subroutine calls
It's the very middle white note on a Piano beside the two black ones called 'Middel C.;
From there, use it as your compass for direction on the quantim. |
6th Apr 2023, 11:59 pm | #19 | |
Triode
Join Date: Apr 2023
Location: Sydney New South Wales, Australia.
Posts: 31
|
Re: Stack and subroutine calls
Hi,
[QUOTE=ortek_service;1548867] Quote:
The best thing I found working on CPU like the SC/MP and, to a slightly lesser extent, the Signetics 2650, is segmentation on the x86 is nothing. It's a walk in the park. On paper the x86 looks way better then a 6502, but it does cop a lot of derision due to segmentation. Seriously.... a "4-bit shift-left and add" is difficult? Well, not after dealing with a SC/MP or 2650 it's not river |
|
7th Apr 2023, 12:12 am | #20 |
Octode
Join Date: Mar 2020
Location: Kitchener, Ontario, Canada
Posts: 1,265
|
Re: Stack and subroutine calls
For the 4k pages on the sc/mp I have been wondering if it was worth adding a memory mapper, maybe 74ls/hct612 as this also has 4k pages. Taking that one stage further with a couple of 74x283 adders the stack page could scroll over a larger range of memory.
|