Archive for May, 2008
Clocks and Interrupts
Today I moved a few steps closer to completion of the initial hardware design. First, I verified that BMOW’s interrupt mechanism actually works. My previous experiments with the keyboard and USB employed busy loops to continuously poll for new values. Although I had built in support for an interrupt line from the beginning, I never actually tested it until today. Luckily, it works fine. When interrupts are enabled, and a device signals an interrupt request, the next time an instruction is fetched from memory it will read zero instead of the instruction opcode. Opcode zero is the “break” instruction, which saves the current register contents and jumps to an interrupt service routine whose address is stored in a dedicated trap vector. The ISR then queries each device to determine which one generated the interrupt, and handles whatever work needs to be done. Finally, an RTI (return from interrupt) instruction restores the register contents and re-fetches the instruction that was interrupted.
As a simple test, I rewrote my earlier keyboard editor demo to use interrupts for the keyboard rather than polling. I had the main program print a continuous series of dots to the LCD, while the interrupt service routine decoded keyboard input and echoed the key to the LCD. It worked as you’d expect: a continuous series of dots that got intermixed with other letters if you typed at the keyboard.
The second task of the afternoon was to add BMOW’s real-time clock. I’d previously planned for the RTC and incorporated it into the design schematics, but hadn’t actually connected the hardware until now. I’m using a Texas Instruments BQ4847 integrated RTC module with a built-in battery. The RTC enables BMOW to know what date and time it is, which is nice, but its more important use is as a programmable interrupt timer. I plan to use a repeating RTC timer interrupt with a period of 1ms or so as the basis for multi-tasking in the OS. Each time the interrupt hits, the ISR will suspend whatever program was running, restore the state of another previously suspended program, and return. As long as programs are time-sliced rapidly enough, and all the relevant machine state is preserved when a program is swapped in and out, it should appear as if all the programs are running simultaneously.
To test both functions of the real-time clock, I configued an interrupt to trigger every 1 second, and an interrupt service routine to print the current date and time.
One other interesting feature of the RTC module is its ability to turn a separate static RAM into a battery-backed SRAM. The RTC’s Vout pin can be used to provide power to an SRAM from the clock battery when the system power is off, preserving the SRAM contents. The RTC will also write-protect the SRAM by forcing its chip-enable input to false when the system power is in transition or off. It would be pretty easy to convert BMOW’s existing 512K SRAM to a battery-backed RAM this way, or to add a second 512K battery-backed SRAM. A second battery-backed RAM would be a simple way to add a small “disk” to the machine, instead of specialized hardware to connect to a compact flash or SD card, or a real hard disk.
In other news, I updated the “Technical Info” page with more details, so check it out.
Edit: I just noticed that I set the RTC for “Saturday May 30, 2008”, which is a non-date given that today is actually Saturday May 31. It’s interesting that the RTC didn’t barf on it. It must take the day of the week as an input along with date and time, rather than determining the day of the week from the date.
Read 8 comments and join the conversationSmall C
I’ve begun looking more deeply into the “Small C” compiler that Bill mentioned. It looks very cool, yet also wildly inefficient, which is about what I expected. With some more effort, I think I could get a working C compiler for BMOW that generated not-too-awful code, which would be great for porting programs from other systems, or something like getting an open-source chess program running. But for cases where the size and speed of the code are important, Small C isn’t going to replace hand-written assembly.
There are several versions of Small C floating around on the internet. Small C Plus appears to be the most fully-featured, with support for structs and floating point math. Unfortunately it looks difficult to port, being targted for some ancient Amstrad CP/M computer. Maybe I’ll take a shot at it someday.
Small C v2.1 by James E. Hendrix seems to be the “official” version, and is the one discussed in the Small C handbook that I bought. I had a lot of difficulty getting it to compile and run with a modern compiler (Visual C++ 2005), though. It seems to assume that uninitialized memory will contain zero, and uses some non-standard library functions that only exist if you compile Small C with itself. I did eventually manage to get it to build and run, and compile a simple program, but it looks awkward to port.
That leaves Small C for UNIX, which looks to be the most feature-poor, yet the easiest to port. I was able to get it to compile and run under Visual C++ 2005 in a few minutes. All the machine-specific code generation routines have already been pulled out into a separate file, so in theory it should be easy to retarget to the BMOW instruction set.
I was surprised at how small the compiler’s source code is. I expected all kinds of code related to context free grammars and what not. Instead it’s only about 3000 lines of code, which is pretty manageable. As far as I can tell, it does only one forward pass through the code, with a lot of lines like:
if (*ptr == '[' ) do_array();
There’s a lot that Small C doesn’t have, of course. There are no structs, floats, or bools, and no function pointers. All functions are assumed to return int. You can’t define a variable when you declare it (like int x=4;). And function headers must be written in ancient K&R style C, where the types of the formal parameters are given separately from their names, in the space before the function’s first open curly brace. But in exchange for these limitations, you get a compiler that’s simple enough to understand and retarget without going crazy.
The compiler assumes that the CPU has two general registers, and a stack. It also assumes that each register is large enough to hold an int, which is also the size of a pointer. The minimum entry requirements are therefore two 16-bit registers and a stack pointer, which is already a problem for BMOW. I can treat two 8-bit registers as a single 16-bit register by having the compiler emit assembly to manipulate them individually, but BMOW only has three 8-bit registers, which makes one and a half 16-bit registers. I would need to add a fourth 8-bit register somehow if I wanted to meet Small C’s assumptions about the hardware. A work-around is to use a fixed memory location as the missing register, but that’s ugly and slow.
Small C also assumes that the registers all support the same operations for most purposes. This isn’t the case for BMOW, where many operations like addition are only supported for the accumulator register. To add a number to another register, it’s necessary to save the accumulator somewhere, copy the other register to the accumulator, perform the addition, copy the result back, and restore the original accumulator value. That’s not the kind of code you want your compiler to be generating very often.
The biggest mismatch between Small C and BMOW is address pointers. BMOW has no such concept: you can’t store a pointer in a register (or a pair of registers) and then dereference it. Pointers in memory can be dereferenced, and there are enough flexible addressing modes like stack-relative, x-indexed, so that in most cases your program shouldn’t need to explicitly dereference a pointer. In contrast, Small C assumes that either of the registers can be treated as pointers and dereferenced with ease. Any time a value must be loaded from memory, the compiler emits code to first load the address into the primary register, and then load the data using that address. This results in a block of horribly convoluted BMOW assembly to accomplish the same thing.
An example would probably help. Consider the following C program:
func() { int g; g = 0xDEAD; return g; }
This could easily be collapsed to a single line that returns 0xDEAD without the need for a local variable, but ignore that for the moment. So this program must:
- Allocate stack space for a 2-byte local variable
- Store 0xDEAD in the stack location of the local variable
- Retrieve the current value of the local variable from the appropriate stack location (yes it will always be 0xDEAD), and store it in the primary register
- Return
I’ve defined the “primary register” as the Y and A registers, with Y holding the high byte and A the low byte. So some hand-written assembly for func() might be:
func: ; reserve space for local variable "g" pha ; push anything on the stack, value doesn't matter, only the change in the stack pointer pha ; store 0xDEAD in g lda #$DE sta 2,s ; store DE at stack + 2 lda #$AD sta 1,s ; store AD at stack + 1 ; retrieve g into primary register lda 2,s ; get high byte from stack + 2 tay ; transfer it to the Y regster lda 1,s ; get low byte from stack + 1 plp ; free the space on the stack used by the local variable plp rts ; return
That’s not too bad. Compare that with the code generated by my BMOW back-end for the Small C compiler. In addition to YA as the primary register, it also uses X and a fixed memory location (cc_regw) as a secondary register, and another fixed memory location (cc_regtemp) as scratch space. It also makes use of two subroutines to put (cc_pint) and get (cc_gint) an int to/from the primary register, similar to the original 8080 back-end for Small C. cc_gint assumes the address is already in the primary register, and overwrites it with the value at that address. cc_pint assumes the address is in the secondary register, and the value to store is in the primary register.
cc_regw = $4FFF cc_regtemp = $4FFD cc_gint: sta cc_regtemp sty cc_regtemp+1 phx ldx #1 lda (cc_regtemp),x tay dex lda (cc_regtemp),x plx rts cc_pint: stx cc_regtemp phx ldx cc_regw stx cc_regtemp+1 pha ldx #1 tya sta (cc_regtemp),x pla dex sta (cc_regtemp),x plx rts ;func() func: ;{ ; int g; pha pha ; g=0xDEAD; stx cc_regtemp tsr ldx cc_regtemp clc adc #<1 bcc + iny + pha tya clc adc #>1 tay pla phy pha lda #<57005 ldy #>57005 plx stx cc_regtemp plx stx cc_regw ldx cc_regtemp jsr cc_pint ; return g; stx cc_regtemp tsr ldx cc_regtemp clc adc #<1 bcc + iny + pha tya clc adc #>1 tay pla jsr cc_gint jmp cc1 ;} cc1: plp plp rts
The tsr instruction transfers the stack pointer to the X, Y, and A registers. X gets the bank byte of the stack pointer, which we don’t need, and since we also don’t want to destroy the value in X, the code pushes and pops it around the tsr.
It all works, but… yuck.
Read 4 comments and join the conversationKeyboard Kaos
After another round of bug fixing and the creation of a BMOW keyboard driver, the keyboard works! And it truly works this time: pressing the A key shows “a” on the LCD, and pressing SHIFT and the A key shows “A”, just like you’d expect. The craziness with key scan codes has been conquered. I even added support for newline, backspace, and scrolling to the LCD driver, to create a decently functional 20×4 text editor.
I’m using an old Compaq PS/2 keyboard that I had lying around. With its own keyboard input and LCD output, BMOW is now an interactive, self-contained computer. No connection to a PC or terminal is needed. It’s just psychological, but somehow this makes it feel more “real”.
Writing the keyboard driver wasn’t too bad. I wrote it first in C, then hand-translated the C to assembly. It’s a simple state machine that maintains the current up/down state of all the modifier keys (shift, control, alt), as well as some state related to extended key codes, and make/break flags. When a new scan code arrives, it decides if that represents an internal state change (shift was pressed, etc), or a real key event. If it’s a real key event, it maps the scan code to an ASCII value using the modifier values and a lookup table, stores it in OS keyboard buffer, and sets a flag indicating that a new key is ready for handling by the program.
When I first powered on the CPU to test the keyboard driver, it didn’t work at all. The hardware went back to the annoying state of “nothing happens at all”, which is a pain to debug. After some sleuthing, I traced the cause to a bug in the GAL equations for the stack pointer: the most significant bit of the stack was getting cleared whenever the SP was incremented. Doh! How did that ever work before? After fixing the stack pointer bug, the keyboard still didn’t work. With more sleuthing, I discovered that two of the wire connections on the keyboard connector had broken off. Double doh!
Oddly, having a working text “editor” with the keyboard and LCD seems less interesting than the scan code debug display I had before. Typing text on a keyboard and seeing it on a screen is something we do all the time, and it seems completely unremarkable. We just expect the right letters to appear, backspace to work, and return to begin a new line, without even thinking about it. I guess BMOW has now reached the same level of dull predictability as any other PC!
Read 2 comments and join the conversationMicrocode Cleanup
I’ve gotten a bit derailed from the task of writing the keyboard scan code handler, into the bigger issue of how device drivers like this should work in general in BMOW. That’s led me to an entirely different question of the BMOW operating system (if you can call it that), and what kind of instruction support might be needed for the OS and user programs to communicate. It’s not something I need to resolve now, really, but I’d rather give it some thought and hopefully get it right the first time.
The simplest approach would be a single-user machine, similar to how vintage 8-bit computers like the Apple II typically worked. Then the “OS” wouldn’t be much more than just a library of helpful routines for programs to call to interface with the hardware.
It may be a pipe dream, but one of my larger goals is to support some kind of multi-tasking OS, at least in a simple form. The key to this is some kind task scheduler, and a protected memory design to prevent one user program from interfering with another. My plan is to make all the user programs run as if BMOW had 16-bit addresses, with a 64KB memory space. Since BMOW actually has 24-bit addresses, the uppper 8-bits will be treated as a process ID. None of the normal CPU instructions will be able to view or modify the upper 8-bits of an address register. When an interrupt hits or an OS routine is called, however, the upper 8-bits will be set to zero, which will essentially be the OS’s process ID, with its own memory space. I can then add some additional CPU instructions for manipulating full 24-bit addresses, whose microprograms verify the upper 8-bits of the PC are zero before doing anything else. So in theory at least, each user program will exist in its own 64KB space and be invisible to other user programs, but the OS can still view and modify them all. There are still many details to work out, however, and my original hardware design didn’t exactly foresee all this, so it’s a challenge.
For task switching, I would use the yet-to-be-installed real-time clock, and set it to trigger an interrupt every few milliseconds. The interrupt mechanism would then push all the CPU registers onto the current user program’s stack, then jump to an OS interrupt handler. This handler would need to copy the stack pointer into a process table somewhere in OS memory, install the stack pointer from a different process, pop the saved registers off the stack, and continue execution of the next process. Boom, it’s a multi-tasking OS. Of course I’m glossing over many details, and in practice it will probably be much more complicated than this. There are many questions to answer, such as whether the OS itself can be pre-empted by an interrupt.
Assuming this all works, BMOW should be able to support as many processes as there are 64KB banks available, minus one for the OS. With the 512KB currently installed, that’s 7 user processes, which is probably plenty. Increasing the installed RAM to the 1MB I originally planned for would raise the user process limit to 15. I can’t imagine that I could run 15 simultaneous programs on this hardware at any kind of useful speed, but it might be interesting to try!
Read 4 comments and join the conversationKeyboardin’ Part 3
The keyboard hardware interface works! I’m now a happy key-tapping fool, because with its own keyboard and LCD display, BMOW can operate completely independently from any host PC. The 4×20 LCD is certainly a very cramped workspace, so a Hyperterminal connection over USB from a PC is still the most practical solution for any serious BMOW interaction. But for the sheer cool factor, a stand-alone BMOW with its own I/O beats a USB tether to a Wintel box hands down.
Of course I can’t actually type anything useful with the keyboard interface yet, that would be too easy. You might think that the keyboard sends ASCII bytes to the computer as you type them, similar to a terminal connection, but in fact it’s a totally different interface. Each keypress sends a scan code to the computer, which is normally a single byte but can sometimes be two or three bytes. Then when the key is released, the keyboard sends a “break” scan code, followed by another scan code for the released key, which is usually but not always the same as the keypress scan code. Even the shift key works this way. So to tell if the user typed a capital “A”, you’ve got to keep track of the current state of the shift key by listening for its keypress and key release scan codes, then translate the scan code byte 0x1C to the “a” key, then finally convert that to the character “A” when the shift key state is factored in. I haven’t written any of the key state and scan code conversion software yet, so while it’s true that my hardware interface works, all it does currently is display scan codes.
My original hardware design for the keyboard interface almost worked on the first try. The only problem I ran into was with the clock signal generated by the keyboard, which is used to synchronize the data it sends to the computer. The rise time on the clock signal was mind-numbingly slow: about 1.2 microseconds to go from 0 to 5 volts. Compare that to BMOW’s main system clock, which rises in about 15 nanoseconds. The slow keyboard clock was causing trouble, because the different BMOW chips connected to it sensed a low-to-high clock transition at slightly different voltages, which equated to substantially different transition times due to the slow rise time. My solution was to feed the keyboard clock signal through a Schmitt trigger inverter, which turns a slowly-changing input like this one into a nice clean, quick low-to-high step.
New Instruction Set?
I’m sort of at a crossroads now, with the basic hardware working, and I need to decide what to do next. There’s still some remaining hardware work, of course: the keyboard and real-time clock, but neither of those is essential. Then there are possible hardware extras, like video. And there’s the entire operating system and software library to consider. My eventual goal is to support multiple programs running at once, managed by a multi-tasking OS.
I find myself gravitating towards a rewrite of the BMOW instruction set. Since the machine is microcoded, there’s nothing specifically tying the hardware to the choice and syntax of instructions. The present instruction set is nearly identical to the 6502 instruction set, which seems like a reasonable fit, but I’m getting more and more annoyed with it. I’ve been looking at the 68000 instruction set, and while its hardware assumptions (e.g. 32-bit registers) don’t match BMOW very well, it feels like a good foundation for a cleaner, simpler instruction set.
My main gripes with the current 6502-style instruction set are:
- Every combination of operation and operand is a unique instruction, which feels clunky and difficult to follow. Loading the X register is LDX, storing Y is STY, transferring A to Y is TAY. Compare this with a 68000-style MOVE SRC, DST. So the previous examples would like something like MOVE (A0), X, MOVE Y, (A0), MOVE A, Y.
- Some obvious-seeming instructions are just missing. You can push A on the stack with PHA, but there’s no PHX or PHY. There’s no instruction to add two registers: only to add a register to a value in memory.
- There are no stack-relative addressing modes. Support for the stack pointer in general is pretty minimal. This makes sense for the 6502’s single-byte stack pointer, but BMOW has a full 24-bit pointer for the stack.
- There’s no concept of an address stored in a register. The ability to auto-increment and decrement address registers is similarly hidden from the assembly programmer.
- All the opcodes are three letters. This is a small thing, but it’s annoying and makes the assembly less readable. It’s not like there’s some budget for opcode name lengths.
I’m thinking of rewriting BMOW’s microcode to implement a new instruction set much closer to the 68000’s. To the assembly programmer, BMOW would appear as three 8-bit data registers (the current A, X, and Y) and two 24-bit address registers (the AR and SP, which presently are visible only at the microcode level). The 8-bit T register would remain hidden. I think I still need it as a temporary location for implementing certain instructions in microcode, but if not, I could expose it as a fourth data register.
Exposing the address registers is the most powerful change. Consider a hypothetical program to copy 100 bytes between two memory locations. First the 6502-style:
LDX #99 loop: LDA $4000, X STA $5000, X DEX BPL loopThen the 68000-style program:
MOVE #99, D0 MOVE $4000, A0 MOVE $5000, A1 loop: MOVE (A0)+, (A1)+ SUB #1, D0BGE loop
It may not be obvious, but I could implement the 68000-style instructions to run this program far faster. For instance, that MOVE (A0)+, (A1)+ could be done in 3 clock cycles, compared with 10-12 for the LDA/STA pair, because the address registers are already set up and no address arithmetic is needed. Furthermore, the 68000-style program could be a subroutine in which the actual source and destination addresses are computed at runtime, while the 6502-style program forces the addresses to be known at assembly time, or to use even slower indirect, indexed addressing.
There are some downsides to replacing the instruction set with a new 68000-style one:
- Performance may be worse in some cases with a 68000-style instruction set. In general, such an instruction set would have slightly simpler instructions, and so would require more instructions to accomplish the same work as a 6502 instruction set. For example, the 6502 instruction LDA $4000 would be translated as the two 68000 instructions MOVE $4000, A0 and MOVE (A0), D0.
- The hardware can’t easily support indexed address modes like MOVE 4(A0), D0. That instruction is supposed to load a value from a location in memory 4 bytes beyond where A0 points. That means adding 4 to A0, but there’s no place to store the result of that addition other than back into A0 or A1. So the microcode would need to save and restore the old value of A0 (slow), or make indexed address modes actually change the address register (an unwanted change in semantics), or drop support for that address mode entirely.
- All software and tools would need to be written from scratch. Currently I’m using a modified 6502 assembler, and I can sort of run existing 6502 programs with a little work. But a BMOW-68000 would be different enough from a real 68000 that I’d need to write my own assembler, and couldn’t hope to run existing 68000 programs.
One other option would be to create an instruction set modeled on the 65816, the 16-bit successor to the 6502. It does add stack-relative addressing, as well as some of the obvious-but-missing instructions from the 6502. It still strikes me as rather clunky and non-symmetric, but I should probably take a deeper look at it before jumping to conclusions.
Read 5 comments and join the conversation