Archive for 2008
Hardware Glitches
Just after I wrote last week’s entry, the hardware went from proudly proclaiming “BMOW is alive!” to merely stating “BMOW is aliv”. I was able to track this down to a problem with the program counter chip. Often, but not always, the low byte of the PC would roll over from FF to 01 instead of 00, and continue counting from there. The lowest bit of the address, A0, seemed to get stuck at 1 during a rollover. That wasn’t too painful to diagnose, but understanding *why* it was skipping 00 and how to fix it proved to be a much bigger problem.
The PC is implemented as a GAL, so the first thing I did was try replacing it with a new GAL, on the theory that the chip was bad. No help. Then I double-checked my GAL equations, and the raw fuse map produced by the GAL assembler, looking for errors. I found none. It seemed that the problem wasn’t simply bad hardware, nor a flaw in the logic, but some kind of electrical/noise/timing problem. Exactly the sort of problem I fear the most.
I checked every pin on the chip with my oscilloscope, looking for obvious spikes, noise, or power sags, but everything looked pretty good. There was some noise on some of the data load inputs, but nothing egregious, and those inputs aren’t used when incrementing the PC anyway. The scope showed that during a rollover, when A0 should have transitioned from a high voltage to a low one (1 to 0), it would start to dip low for about 5ns, then suddenly pop back up to a high voltage.
I tried slowing the system clock all the way from 1MHz down to 250kHz without success. I modified the GAL programming to make the PC increment every clock cycle, ignoring the count enable input. Of course this made the machine totally non-functional, but the rollover bug still occurred, as demonstrated with the scope. I tried removing all the other chips connected to A0, but the problem still persisted, and now the machine was even more non-functional.
Finally I tried rewriting the GAL equations to move A0 to another output pin, and the problem followed A0 to its new pin. This seemed a key bit of evidence, suggesting that the problem was not with the pin itself, nor the wires connected to the pin, but with the logical quantity A0. That led me to ask what was different about A0 versus A1-A7, which didn’t exhibit any problems. The answer is that when counting is enabled, A0 always changes state: a 0 becomes 1, or a 1 becomes 0. The other bits only change state depending on the values of the lower bits. In short, the PC was acting exactly as if it were being clocked twice in rapid succession.
I jumped back to the oscilloscope, looking carefully at the clock input to the PC at the moment of rollover. The clock looked fine, in fact it looked very clean. My scope only has 5ns timing resolution, though, so if there were a glitch of less than 5ns on the clock line, it might not show up. I wondered if there was a way to avoid a double-count in the case of a double-clock. I came up with the dangerous-sounding idea of including the clock itself in the product term used to compute the new A0 value at a clock edge. This certainly feels strange: by definition the A0 value will change exactly when the clock transitions from low to high, at which point the clock-as-data value will be undefined. The GAL program change also involved switching the A0 equation from negative logic to positive. Here it is (note the /clk0 term in the new equation):
; old equation |
/q0 := /_reset + _reset*_cnt_in*_ld*/q0 + _reset*/_ld*/d0 + _reset*_ld*/_cnt_in*q0 |
; new equation |
q0 := _reset*_cnt_in*_ld*q0 + _reset*/_ld*d0 + _reset*_ld*/_cnt_in*/q0*/clk0 |
This worked. So the problem has been successfully papered over, but not really solved. There are a couple of other experiments I’d like to try, in order to better understand what’s happening:
- Try a quarter-power GAL instead of my low-power ones. Perhaps the surge in power when all the address lines simultaneously switch from 1 to 0 is causing the clock glitch I can’t see. If so, a more power-efficient GAL might help. I’ve got one on order.
- Try a 15ns GAL instead of my 25ns ones. I don’t think the propagation delay has anything to do with the problem directly, but perhaps the different internal structure of the 15ns GAL would exhibit different symptoms. I’ve ordered one of these too.
- Experiment with various methods of terminating the clock line.
Termination seems to be something of an inexact process, from what I’ve read. I tried connecting a pin somewhere midway along the clock line to ground, through a 220 Ohm resistor, and it made the clocking problems worse. I’ve seen other designs that use an in-series resistor of around 40-80 Ohms, rather than a resistor tied to ground. I’ve been unable to find much good discussion of the need and method of termination for TTL circuits running in the 1-4 MHz range, and most of what I have read talks about terminating signals on the bus or backplane, which I don’t have. If anyone reading this knows more about this and could offer some advice, I’d love to hear it.
Update: Some more details on the PC clock may be useful for termination analysis. The low byte of the PC is computed by the GAL called PCLO in the schematics. It’s using the clock line Q0B, which is output from a 74LS244. Q0B is transmitted along a chain of wires about 21 inches in total length. The ‘244 that outputs the clock is at the beginning of the chain, and PCLO is about 9.5 inches down the chain from the ‘244, and about 11.5 inches from the end of the chain.
The clock signal propagates past PCLO, 11.5 inches to the end of the chain, reflects off the end, and propagates 11.5 inches back. So the reflected signal must travel 23 inches, or about 0.6 meters. Assuming 5 ns per meter signal propagation in copper wire, the reflected clock signal will arrive back at PCLO in 5 * 0.6 = 3 ns after the original signal. Maybe that causes the double-clocking?
Read 4 comments and join the conversationBMOW Is Alive!
Everything seems to be falling into place now. I think a picture says it best:
The LCD is up and running, displaying messages from the ROM program. I wasn’t lucky enough to have it work on the first try this time, though. I had to debug some accidentally swapped address lines and some uninitialized registers, but after a few hours of fiddling, I was rewarded with the greeting in the photo.
I’m starting to realize a few bad points about my physical setup. The biggest headache is getting chips in and out of the wire wrap board to reprogram them. They’re always hard to remove, and even with a chip puller, I’m afraid I’m going to damage or break a pin when the chip suddenly dislodges. I have my boot ROM in a ZIF socket, but all the microroms are in standard (non-ZIF) sockets, and the GALs aren’t socketed at all. What’s worse, the microrom sockets don’t seem to be making a consistent contact with the board. In one case, I was able to toggle the machine between working and not working just by pushing down on one of the microrom sockets a little. This is the sort of random, elusive electrical problem that worries me far more than any design problems.
Things are getting to the point where I really need a case or cabinet of some sort. I don’t necessarily want to put it in the PC case yet, since then I’ll have to be constantly removing it to work on the hardware. But the naked wirewrap board with power, reset, and LCD cables hanging off in random directions is getting pretty unwieldy. I’m thinking about constructing a temporary “development” case that’s more like a frame. It would provide something to easily grip everything by, and a place to anchor the cables and connectors, but still be totally open at the top and bottom. I’m hopeless with machine tools, but maybe I can cobble something half-respectable together with some scrap wood and screws.
Read 2 comments and join the conversationFirst Bootup!
It works!!! Eureka! And on the very first attempt, no less. I have achieved computation from a big mess o’ wires, and a couple of dozen basic logic chips. I can now say confidently that fibonacci(12) = 144. Check out the last line of the logic analyzer data listing:
Each line shows the state for a single clock cycle, with RESET (active low, so 1 means normal operation), OPCODE (in hex), and the X register (in decimal). Opcode FF means the machine has halted. You can see the last few terms of the fibonacci sequence on the preceding lines, although they’re not in order, and there’s a random value of 110 there too.
Did I mention that it works? Holy cow. The best part was that mere moments after the successful bootup, a friend called to ask me about something else, so I talked his ear off about the machine.
Since the hardware is still far from complete, running the fibonacci program required quite a bit of chicanery. At the moment there are only two 8-bit registers, and no RAM. The T register was intended to be used for temporary storage by the microcode, and wasn’t meant to be user-visible at all, so I had to add some additional instructions to expose it temporarily. I also added an instruction to add the X and T registers. Then I had to write microcode for a conditional absolute jump instruction, since the hardware needed for a relative branch isn’t finished yet. I had to modify the absolute jump instruction to work only within the first 256 bytes of memory, to avoid disturbing the T register. And finally, since I didn’t have any place to store a running count of how many fibonacci sequence terms had been generated, I resorted to cheating: the program terminates when the sign bit of the X register (bit 7) is 1. So it’s not really computing fibonacci(12), but rather the first fibonacci number >= 128, which happens to be fibonacci(12).
Here is BMOW’s first program:
* = $0 |
nop ; let’s hope we can execute a no-op, at least |
; load X and T with the first two terms of the fibonacci sequence |
ldx #1 |
sxt ; swap X and T, uses XOR swap since there’s no other temporary register! |
ldx #0 |
loop: |
clc ; clear the carry flag |
axt ; add x + t |
jmi done ; if the result is “minus” (sign bit is 1), exit the loop |
sxt ; swap X and T |
jpl loop ; jump back to the start of the loop |
done: halt |
Honestly, I’m fairly amazed that it worked on the first try. Yes, I’d been testing the subsystems as much as I could as I built them, but this was the first real integration test. What’s more, it was the first test of any kind that tried to modify the program counter, or use the ALU, condition codes, data registers, databus, or memory bus to data bus interface. I fully expected to spend a long time working out all sorts of problems before getting to the first successful program run. Heck, I must have run into 10 different logic and microcode bugs while testing the fibonacci program on the simulator, and the potential for errors in the impenetrable mass of wires the composes the BMOW hardware is far greater.
Here’s a look at the testing setup for my moment of glory:
So now I’ve got a very rudimentary computer, with two 8-bit registers and no RAM, running at a blazing 470 kHz. What’s next? I’ll probably write a few more test programs to exercise the hardware in its current state, to make sure everything’s really working as it ought to. Next, I think I’ll try to tackle integrating the LCD module. Checking the progress of the computer with all those logic analyzer probes is not much fun, so it would be great to display “fib(12) = 144” instead. Once I’m able to check the machine’s health without connecting up the logic analyzer every time, I’ll probably move on to the remaining data registers, RAM, stack pointer, and other hardware devices. There’s still a tremendous amount left to do, but as of today, I can finally say I’ve built a working homebrew CPU.
Read 2 comments and join the conversationAlmost There
I’m getting very close to the first real BMOW bootup. I’ve got two data registers wired up, along with enough of the data bus to use them. Now I just need to double-check the current wiring, add the rest of the registers, and see what kind of test program I devise.
My instruction set is copied from the 6502, with some minor additions and changes. In the course of thinking about a good example program for first bootup, a couple of oddities occurred to me for the first time. The biggest one is that there’s no way to add values from two registers. Instead, the ADC instruction always adds the value in the accumulator register to a value in RAM. That’s a problem for me, since I haven’t yet wired up the RAM, but it also seems like a deeper problem. Why wouldn’t you want something like an ADX instruction, to add A + X? Wouldn’t it be faster than a memory access, if the values you want to add are already in registers?
The second oddity is the very non-symmetric nature of the 6502 instruction set. It’s something I was certainly aware of before, but never really thought much about. With these instructions, each register has different capabilities. The results of an add are always stored in A, never X or Y. Indexed memory accesses use X or Y, not A. Only the accumulator can be bit shifted. It all seems arbitrary and awkward, although I’m sure there were good reasons for those limitations in the original 6502 hardware.
All this has got me thinking that maybe I ought to pattern the BMOW instruction set from the 68000, or MIPS, or something else that’s a little more rational. There’s nothing specifically tying BMOW to the 6502, and with a reprogramming of the microcode, I can implement any instruction set I want, as long as it can be realized on the BMOW hardware. For the time being I plan to stick with the current instruction set, though, since I’ve already written most of the microcode, and got a working assembler too. Writing a new assembler is something I particularly don’t relish.
Musings on instruction sets aside, the BMOW construction and wiring is proceeding smoothly, and I probably have more than 50% of it finished now. A lot of things like devices and the stack pointer aren’t strictly necessary in order to run simple programs, so I hope to have some good news about my first bootup with a real program very soon.
Be the first to comment!Straight Line Code
Believe it or not, I think this pig may actually fly! Things are starting to get interesting. After several more days of wiring, I’ve reached the point where I can execute straight line code (no branches or jumps), with no RAM, and no registers. If you think about it for a moment, you’ll realize that given those restrictions, you can’t really do anything with the computer at all. There’s only one piece of state (the current program address), and there’s no way to change that state other than by sequentially executing instructions. It may even be a stretch to say that it’s “executing” instructions, when they don’t change any state. All the instructions might as well be NOPs.
Despite the outwardly boring appearance, I’m actually very happy with this result. The ability to execute straight line code means:
- The program counter works, because it steps through the program instructions correctly.
- The address bus is wired correctly.
- The address decoder works, because it enables the boot ROM for addresses that are mapped to it.
- The boot ROM is set up correctly, because program instructions are being read from it.
- The external data bus (memory bus) is wired correctly, because instructions are transferred on it to the opcode register.
- The microcode and control system work (although I knew this already from earlier tests).
I didn’t exactly stress test it, but everything seemed very solid and reliable during my experiments, with no weird glitchy behavior at all. The couple of signals I examined with the oscilloscope looked pretty clean.
From here, it’s only a couple more steps before I have something interesting working. I think it’s time to start thinking about my minimal definition of a computer, so I can pinpoint the date of the first successful boot. If I had conditional branching, and two registers, that would be enough to write a simple program to compute a factorial or some such. The result would need to be read out with the logic analyzer, since there’s still no human-readable output, but it would be good enough for me to declare BMOW officially up and running. If all goes well, I should be less than a week away from that goal!
Mmmm, 24 bits of delicious address bus.
I also managed to cram in a ZIF socket for the boot/program ROM, and a shiny red reset button.
Be the first to comment!Funny Fanout Flaws
Ugh, I’ve run into my first design flaw. Fortunately, I don’t think it will be too difficult to fix, but it’s an eye opener nonetheless.
The issue I’m facing is fanout limitations. Fanout is the number of inputs that are connected to an output. Say I’ve got a chip that outputs some data signal, and that signal is passed as an input to ten other chips, then I’ve got a fanout of 10.
So what’s this about fanout limitations? Can’t you connect as many inputs as you’d like to that data ouput? To understand fanout limitations, it’s necessary to leave the cozy digital world of zeros and ones, and consider the analog nature of the circuit. In the case of a logical “1” at the data output, the output chip attempts to bring the output pin to 5 volts, or something close to it. A small amount of current then flows from the output pin to the input pins. In the case of TTL logic (the 7400 family), as long as the voltage at each input pin is over about 2.4 volts, the input chip will treat it as a logical 1. This allows for a certain amount of loss or noise in the circuit between the output and inputs.
Unfortunately, there is always a small amount of internal resistance in the output chip, between the power supply and the output pin, of something on the order of 10K ohms. Let’s say that an input pin draws 20 uA of current, so from Ohm’s law, V = IR, we can calculate an expected voltage drop of 10KOhm * 20uA = 0.2 volts. So with one input connected, the voltage at the output pin won’t actually be 5 volts, it will be 5 – 0.2 = 4.8 volts. As more inputs are connected, the current will increase, and so will the magnitude of the drop below 5 volts. At some point, if enough inputs are connected, the output voltage will sag below the 2.4v threshold needed for a valid logical “1”, and the circuit will stop functioning correctly.
In reality the situation is somewhat more complicated than I’ve described, as the input resistance needs to be considered too, and in fact TTL outputs never drive all the way to 5 volts anyway due to their design. However, the basic conclusion remains the same: connect too many inputs to a single output, and things will go bad. For TTL logic, the generally accepted limit is 10, although it varies from chip to chip.
Examining my schematics, I realized that they call for 10 different chips connected to my memory data bus, and I plan to add one or two more before the project’s completion. That translates to a fanout of 9 to 11 for each of the bus’ data lines. Maybe that would work fine, maybe not, but I don’t really want to find out. With my luck it would kind of work, some of the time, and I’d never be able to figure out what was going on.
My plan for reducing fanout on the memory data bus is to connect all the “write-only” chips (opcode, LCD, and eventually sound and video) to a 74LS244 bus driver, and then connect that driver to the main memory data bus. That will bring the fanout down to 8, even in the worst case if I do add those extra chips. Isolating the read-only and read-write chips should be possible too if it proves necessary, but it would involve using some glue logic to combine various output enable signals, and bidirectional bus-drivers, and other headaches I’d just as soon avoid if I can.
Be the first to comment!