Tiny CPU
I’ve got a working CPU! You can grab the Verilog source and a testbench here. The instruction set and addressing modes are as I described them in my previous post, except that I shrank the stack pointer to 6 bits (64 byte stack), and was able to add the missing branch if carry/zero not set instructions. The CPU has a 10-bit (1K) address space, and fits in 119 macrocells of an Altera EPM7128S, when set to optimize for area and with Parallel Expander Chain Length set to 0. Sometime soon, I’ll make some nice datapath diagrams and post them.In addition to the small address space and limited instruction set, there are a few ugly elements of the design that were necessary to make it fit the device. The absence of a Compare X instruction is glaring, but is impossible to include with significant changes. There’s also a wasted state after many of the math/logic ops, in which the Zero flag is redundantly set. This makes those instructions take one clock cycle longer than actually necessary, but was necessary to avoid more complicated state transition logic. The Zero flag handling in general is definitely awkward.So what’s next? I hope to shrink the design slightly further, by simplifying logic, using more Altera primitives, or by using a smarter instruction set encoding that uses instruction bits directly as control signals. If I can save a few more macrocells, I hope to increase the address space to 11 or 12 bits (2K or 4K), because 1K feels very limited.Beyond that, I’m considering a few larger changes:
Fixed Instruction Size
The current design has instructions that are one, two, or three bytes in size. I’m considering moving to a fixed instruction size of 2 bytes: 6 bits for the opcode, and 10 bits for an address or constant value. This would simplify the state machine logic, eliminating extra states needed to perform operand fetches, and reducing the logic resources needed to implement the state machine. It would probably also result in slightly more compact code, making more efficient use of the limited address space.The downside of a fixed instruction size is that it would also fix the address size at 10 bits (or maybe 11 if I’m really clever), with no hope of increasing it. It would also require an opcode register, to hold the first 8 bits of the instruction while the second 8 bits are fetched. And it would force me to throw out the bastardized 6502 assembler I’ve been using, and create some new software tools.
Larger Bus Size
If I switch to a fixed 16-bit instruction size, it may also be worthwhile to switch to a 16-bit data bus. This would permit loading an entire instruction in one clock cycle, eliminating the need for the opcode register, and further simplifying the state machine. The downside is that I’d then need extra logic to make the memory byte-addressable for load/store of data, or else increase the data word size to 16 bits and forget about byte addressing entirely. A larger data bus output mux would also be needed. And of course, two parallel 8-bit RAMs would be needed on the CPU board.
Harvard Architecture
Not a Colonial Period building at Harvard University, but a computer with separate address spaces for programs and data. This would permit a 16-bit interface to program memory, and 8-bit interface to data memory, which is seemingly the best of both worlds. The program memory address bus wouldn’t need a mux, because it would always be driven by the program counter. Separate program and data memories would also allow for faster CPU operation, by enabling instruction fetches and data access to happen in parallel. The total amount of addressable memory would also increase, because the program and data memories could each be 1K in size, for 2K total.Separate program and data memories mean the CPU board would need two 8-bit ROMs as well as an 8-bit RAM, further increasing the component count.The major drawback of the Harvard Architecture is that working with large data constants like strings and tables is cumbersome, because they must be loaded or copied a byte at a time using Load Immediate instructions. The indexed address instructions typically used to access such structures operate on the data memory. The standard solution to this problem is to use a Modified Harvard Architecture, adding new instructions like Load Constant Indexed to fetch values from program memory. Unfortunately that negates some of the original advantages, requiring an additional address register for program memory, an address bus mux, and additional complexity in the state machine.
Read 4 comments and join the conversation4 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
I understand that you’ve had some initial difficulties with memory interfacing on the FPGA, but is there any other reason you moved entirely to a CPLD platform. I’ve never used a CPLD, but the toolchain should be pretty much identical to target an FPGA and you’ll get substantially more logic for the buck. As folks have pointed out in the past, there are some awesome breakout boards[1] available which, while not ideal, wouldn’t be too hacky to integrate into a larger design. Just a thought. Awesome work!
[1] http://www.sparkfun.com/commerce/product_info.php?products_id=8458
I initially aimed for a CPLD design after feeling burned by my FPGA experience. Since the internal structure of a CPLD is much simpler, it’s easier to understand what’s happening inside. There are no block RAMs, distributed RAMs, built-in muxes, carry chains, etc. It’s more like a very large PAL or GAL, and that simplicity appealed to me. But as you said the tools are basically identical, and I’m sure I could make something work with an FPGA too.
A second reason now motivates me more, and that’s setting some constraints to create an interesting challenge. Putting a soft-CPU in an FPGA is not really challenging (aside from many interesting choices of CPU design). There are lots of soft cores you can choose from like PicoBlaze, MicroBlaze, anything from OpenCores, and many others. But fitting a useful CPU into a CPLD, especially a 128 macrocell CPLD, involves making lots of tradeoffs between functionality and size. It’s just big enough to fit something useful, but small enough that it takes a lot of careful thinking to do it. The end result is meant to be a cool technical hack, like fitting a flashy graphical demo into a 4K executable program size limit.
Harvard architecture, fixed instruction size, instruction width larger than data width…. It sounds a lot more like a pic than a 6502 (not necessarily bad – and not too surprising since your squeezing the most out of so few resources).
Could be! I’m not sure quite how you decide when something’s a microcontroller or a CPU. I’ve seen the 6502 and Z80 described as microcontrollers, so perhaps it’s all relative. I normally think of microcontrollers (like the PIC) as being something with on-board EEPROM memory, A/D circuitry, and lacking any external pins that are specifically dedicated as address/data busses.