Final Design Tweaks?

December 26th, 2007 | Category: BMOW 1 | Author: Steve

I’m trying to finish up the final hardware design now, so I can get started with actually building this thing. Although it will probably never be truly “done”, I don’t want to end up ripping out half the components and wiring to accommodate some new design feature I should have anticipated in the first place.

Here’s what I’ve been considering:

Improved condition code register: I could use a GAL to replace the 4-bit shift register with a custom dual parallel-in, parallel-out register. That would make it possible to load and store the entire CC register in a single clock cycle, instead of shifting data in/out over multiple cycles. The savings would help speed up the BRK and RTI instructions used during interrupt processing, shaving a total of 9 clock cycles off the total time needed to invoke an interrupt service routine and then return to the original program.

Conclusion: Skip it. I expect that a typical interrupt service routine will be tens of instructions long, taking probably 50 to 100 clock cycles, so a savings of 9 clock cycles isn’t that compelling.

Zero-page addressing: The 6502 CPU, from which I’ve borrowed the assembler syntax, has a mode known as zero-page addressing. Instructions using this addressing mode have an implied high-byte of zero for the address, so only the low-byte is specified. This means the instruction requires one less byte, resulting in more compact program code. On the 6502, zero-page addressing instructions also execute in fewer clock cycles than their absolute addressing counterparts. It’s sort of like having an extra 256 registers (the size of the zero page) that can be manipulated with a speed somewhere between true CPU registers and generic memory locations.

To gain a speed benefit from zero-page addressing, the BMOW hardware would require a change to permit zeroing of the high-byte of the address register simultaneously with loading of the low-byte. It would probably also require some tweaking of the memory mapping and reset circuitry, since page zero is currently part of ROM, and the machine begins program execution at memory address 0 after a reset.

If a program could be written such that one in every four instructions employed zero-page addressing, then I estimate it would be about 8% smaller and 6% faster than a program that never used zero-page addressing. In the limiting case where every instruction employed zero-page addressing (not realistic), the program would be 33% smaller and 25% faster.

Conclusion: Skip it, mostly. A typical improvement of under 10% doesn’t seem worth the hassle of changing the hardware design yet again. I may still choose to implement the zero-page addressing mode instructions later as a software-only change (new microcode), which would provide the program size savings, but no speed benefit. It would just substitute a clock cycle where the high-byte of the address register is loaded with some constant value for a cycle where the high-byte would otherwise have been loaded with a byte from the program code.

Add a Y register: I’ve been talking about this for a while, and I think I’ve figured out how to shoehorn a Y register into the left ALU input, where it must be in order to work as intended. The left input already has 4 possible sources, and with no spare control ROM outputs, and I was originally stumped as to how to support a fifth source for the left input.

My solution is to make X and X7 (a pseudo-register containing X’s sign bit) share a single enable signal from the control ROM. This signal would be AND-ed with the load enable signals for PCHI and ARHI, the high-bytes of the program counter and address register, in order to create the individual enable signals for X and X7. If the load destination were PCHI or ARHI, then X7 would be enabled, otherwise X would be enabled. While this is arbitrary and potentially limiting, in practice it mirrors exactly how X7 is already used for address calculations by the microcode. With X and X7 now sharing a control signal, there would be a free one for the Y register.

Conclusion: Do it. While the solution is a bit ugly, it’s relatively isolated. Adding Y will give the machine three general-purpose data registers rather than two, which is a significant improvement that should enable writing substantially faster/simpler programs. It will also make it much easier to port 6502 assembly programs to BMOW.

More than 64K Memory: 64K is the standard memory space for an 8-bit machine, but something larger would open up many interesting possibilities related to multi-tasking, for which 64K is probably too small to be practical. It would also allow the creation of single programs operating on larger data sets. Realistic values for the total amount of memory are in the 128K to 4MB range, assuming the use of standard SRAM.

A key consideration is how the extra memory should be addressed. One option is to have a separate segment register to hold the highest address bits. This register might be explicitly loadable by programs, or might be controlled by the OS, with each process given a separate segment. With this scheme, the bulk of the instructions would still use 16-bit addresses, and the segment register would presumably be altered infrequently. The alternative is to change all the instructions to use 24-bit addresses, providing for a totally generic 4MB address space. That would negatively impact program size and speed, however, due to the extra byte of address data in most instructions. Fortunately these approaches all require the same underlying hardware, with the differences lying entirely in the instruction set design and microcode.

Conclusion: Do it. The extra hardware needed is trivial, and the decision regarding how to use the additional memory can be made later.

But wait, there’s more! On top of these four issues, there are several other half-conceived ideas flying around my head as well:

Direct connection of a keyboard and monitor (or LCD panel?), instead of using a PC as a terminal.
Compact flash or IDE-based file storage.
Integration of a real-time clock with timer interrupts.
Two-phase clock. Investigate the necessity of buffering for clock signals due to TTL fanout limits.
Physical construction. I need a case, a power supply, on/off switch, reset button, maybe a fuse? The case must also provide easy access to all the hardware, as well as space/power/mounting points for future add-ons I haven’t yet thought of.

I think I’m getting a little carried away. It’s time to build the basic machine and get it working, then I can return to these other ideas.

Be the first to comment!

Final Design Tweaks?

No comments yet. Be the first.

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.