Hardware Glitches
Just after I wrote last week’s entry, the hardware went from proudly proclaiming “BMOW is alive!” to merely stating “BMOW is aliv”. I was able to track this down to a problem with the program counter chip. Often, but not always, the low byte of the PC would roll over from FF to 01 instead of 00, and continue counting from there. The lowest bit of the address, A0, seemed to get stuck at 1 during a rollover. That wasn’t too painful to diagnose, but understanding *why* it was skipping 00 and how to fix it proved to be a much bigger problem.
The PC is implemented as a GAL, so the first thing I did was try replacing it with a new GAL, on the theory that the chip was bad. No help. Then I double-checked my GAL equations, and the raw fuse map produced by the GAL assembler, looking for errors. I found none. It seemed that the problem wasn’t simply bad hardware, nor a flaw in the logic, but some kind of electrical/noise/timing problem. Exactly the sort of problem I fear the most.
I checked every pin on the chip with my oscilloscope, looking for obvious spikes, noise, or power sags, but everything looked pretty good. There was some noise on some of the data load inputs, but nothing egregious, and those inputs aren’t used when incrementing the PC anyway. The scope showed that during a rollover, when A0 should have transitioned from a high voltage to a low one (1 to 0), it would start to dip low for about 5ns, then suddenly pop back up to a high voltage.
I tried slowing the system clock all the way from 1MHz down to 250kHz without success. I modified the GAL programming to make the PC increment every clock cycle, ignoring the count enable input. Of course this made the machine totally non-functional, but the rollover bug still occurred, as demonstrated with the scope. I tried removing all the other chips connected to A0, but the problem still persisted, and now the machine was even more non-functional.
Finally I tried rewriting the GAL equations to move A0 to another output pin, and the problem followed A0 to its new pin. This seemed a key bit of evidence, suggesting that the problem was not with the pin itself, nor the wires connected to the pin, but with the logical quantity A0. That led me to ask what was different about A0 versus A1-A7, which didn’t exhibit any problems. The answer is that when counting is enabled, A0 always changes state: a 0 becomes 1, or a 1 becomes 0. The other bits only change state depending on the values of the lower bits. In short, the PC was acting exactly as if it were being clocked twice in rapid succession.
I jumped back to the oscilloscope, looking carefully at the clock input to the PC at the moment of rollover. The clock looked fine, in fact it looked very clean. My scope only has 5ns timing resolution, though, so if there were a glitch of less than 5ns on the clock line, it might not show up. I wondered if there was a way to avoid a double-count in the case of a double-clock. I came up with the dangerous-sounding idea of including the clock itself in the product term used to compute the new A0 value at a clock edge. This certainly feels strange: by definition the A0 value will change exactly when the clock transitions from low to high, at which point the clock-as-data value will be undefined. The GAL program change also involved switching the A0 equation from negative logic to positive. Here it is (note the /clk0 term in the new equation):
; old equation |
/q0 := /_reset + _reset*_cnt_in*_ld*/q0 + _reset*/_ld*/d0 + _reset*_ld*/_cnt_in*q0 |
; new equation |
q0 := _reset*_cnt_in*_ld*q0 + _reset*/_ld*d0 + _reset*_ld*/_cnt_in*/q0*/clk0 |
This worked. So the problem has been successfully papered over, but not really solved. There are a couple of other experiments I’d like to try, in order to better understand what’s happening:
- Try a quarter-power GAL instead of my low-power ones. Perhaps the surge in power when all the address lines simultaneously switch from 1 to 0 is causing the clock glitch I can’t see. If so, a more power-efficient GAL might help. I’ve got one on order.
- Try a 15ns GAL instead of my 25ns ones. I don’t think the propagation delay has anything to do with the problem directly, but perhaps the different internal structure of the 15ns GAL would exhibit different symptoms. I’ve ordered one of these too.
- Experiment with various methods of terminating the clock line.
Termination seems to be something of an inexact process, from what I’ve read. I tried connecting a pin somewhere midway along the clock line to ground, through a 220 Ohm resistor, and it made the clocking problems worse. I’ve seen other designs that use an in-series resistor of around 40-80 Ohms, rather than a resistor tied to ground. I’ve been unable to find much good discussion of the need and method of termination for TTL circuits running in the 1-4 MHz range, and most of what I have read talks about terminating signals on the bus or backplane, which I don’t have. If anyone reading this knows more about this and could offer some advice, I’d love to hear it.
Update: Some more details on the PC clock may be useful for termination analysis. The low byte of the PC is computed by the GAL called PCLO in the schematics. It’s using the clock line Q0B, which is output from a 74LS244. Q0B is transmitted along a chain of wires about 21 inches in total length. The ‘244 that outputs the clock is at the beginning of the chain, and PCLO is about 9.5 inches down the chain from the ‘244, and about 11.5 inches from the end of the chain.
The clock signal propagates past PCLO, 11.5 inches to the end of the chain, reflects off the end, and propagates 11.5 inches back. So the reflected signal must travel 23 inches, or about 0.6 meters. Assuming 5 ns per meter signal propagation in copper wire, the reflected clock signal will arrive back at PCLO in 5 * 0.6 = 3 ns after the original signal. Maybe that causes the double-clocking?
Read 4 comments and join the conversation4 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
That’s really interesting, I’ve never heard of a gal, but it is good that the problem was solved. Off that it came up out of the blue.
You seem to be building a quite complex system. Once I finish up these power electronics projects, I’ll probably buy one of those WW boards.
Oh, I also had a little idea about how to do VGA-type video.
I dont know if you could wire wrap it but:
Some bytes of memory, maybe kilobytes, parallel a few bits together, maybe 4-8. Decide whether to do monochrome or color.
If its monochrome, just write your data into the memory, and clock out however many pixels you want every hSync. Them being parallel means you can get all bits out at the same time and do a r/2r DAC
As for sync signals, I have had a lot of trouble getting those into code on a 20mhz atmel uC… I dont know what’s best for that…
Keep on truckin!
Thanks for the ideas. I’m not sure I understand what you’re suggesting with the video, though. The biggest problem I see is arbitrating between CPU writes and video circuitry reads of the video memory, since you can only do one at a time. Once you’ve read out the data, generating the VGA signal using R2R or a DAC seems comparatively simple, although I’m sure there will be some challenges there. If I ever attempt this, I will probably build some custom video circuitry, rather than a microcontroller.
There is a bit of information about “series termination resistor at the source” at
“standard signal tips”
http://massmind.org/techref/electips.htm#series_resistor
and
“avoiding noise in wire-wrap circuits”
http://massmind.org/techref/noises.htm
.