Debugging
Holy cow, this thing is riddled with bugs! I started trying to address some of the bugs described in the Boot Logo posting, with mostly positive results. Some problems turned out to be hardware issues, some were in the software, and some still have me scratching my head.
The first bug I’d described in the Boot Logo posting was one where the keyboard appeared not to work during the bootloader sequence, depending on what address the bootloader program was assembled to. A few NOPs added or removed made the difference between the keyboard working or doing nothing. After the usual amount of swearing and probing with the oscilloscope, I discovered that this was caused by a hardware glitch, or more precisely, a bug in the hardware design.
The BMOW keyboard interface hardware reads in a byte’s worth of bits and then signals an interrupt. When the CPU reads the key scancode, that clears the interrupt and resets the bit shifter. The trouble was that as the address bus lines bounce around during the early part of a clock cycle, before settling on their final values, occasionally they would briefly appear as the keyboard interface address. So the interface hardware thought the CPU was reading a scancode, and reset the bit shifter. When the bootloader was assembled to certain addresses, it just happened to cause a pattern on the address bus that exposed this problem, constantly resetting the keyboard bit shifter before a full byte could be read, and rendering the keyboard inoperative as a result.
My fix was to modify the hardware slightly, so that the keyboard interface address must appear on the bus during the second half of the clock cycle in order to reset the bit shifter. It required changing a couple of wires, and altering the keyboard GAL code. This isn’t a perfect fix, but 99.9% of the time the address bus lines should have settled on their final values by then. In practice, it seemed to fix the problem.
IST?
The second bug from the boot logo posting was also one of missed keys from the keyboard. When running the Apple II monitor program, after disassembling a section of memory, the next key typed wasn’t being recognized, so “LIST” became “IST”.
You might think this was related to the first keyboard problem, but it proved to be entirely different. It took me a long, long time to resolve. It turned out to be an interrupt problem. The interrupt routine was inheriting whatever memory bank was active at the time of the interrupt, rather than explicitly setting the active bank to the one it needed. This meant that when a keyboard interrupt hit while video was being drawn, the keyboard driver tried to update its state from VRAM! This caused the key to be ignored, and also sometimes drew a garbage pixel somewhere on the screen. A one-line fix to the interrupt routine to set the active bank fixed the problem quite nicely.
I then attempted to look into the sporadic loss of VSYNC, but kept running into other problems. I lost some time to an intermittent crash on reset, which proved to be another instance where the interrupt routine wasn’t setting the active memory bank. Then I lost still more time to an intermittent crash on power-up. I *think* this was caused by attempting to service an interrupt before the interrupt vector was set up. I altered the bootloader to disable interrupts until the interrupt vector is set, and the problem now seems to have gone away.
Finally I returned to the loss of VSYNC issue once more. I confirmed what I’d discovered when I first encountered the problem: the video row counter is occasionally miscounting when the CPU accesses VRAM. I can’t explain why this happens. It’s a 10-bit counter split across two GALs, and it should endlessly count rows from 0 to 524, generating VSYNC from the count. Nothing should ever alter its counting behavior. Yet I observed it sometimes skipping backwards or forwards in the count, or missing 524 and counting all the way to 1023 before wrapping around.
I can only guess this is caused by some kind of timing or noise problem. Might one of the counter inputs be changing just at the instant of the clock pulse? Or might an input have taken on an illegal voltage? Half-hearted probing with the oscilloscope didn’t find any evidence of such problems.
Feeling a little discouraged, I decided to change course, and alter the video console code to synchronize VRAM access with the VBLANK period to eliminate the “snow” that appears when the CPU and display circuitry contend for VRAM at the same time. I added code to wait until the start of the VBLANK period before accessing VRAM, which should have made for slower, snow-free access.
The new code mostly worked, but not 100%. For one thing, the code has no way of knowing exactly where in the VBLANK period the display circuitry is. It might check and confirm that it’s the VBLANK period, and start doing a VRAM access, just as the VBLANK period ends and the display circuitry tries to access VRAM to display the next frame. I don’t have any easy way of resolving that.
More surprisingly, I discovered that some instructions that did not access VRAM were exercising the hardware as if they did, causing snow. That led to more swearing and probing: the usual routine. What I finally found was a pair of problems related to unexpected addresses on the address bus, similar to keyboard bug #1. Some of BMOW’s instructions drive an address onto the address bus in order to later push that address onto the stack. When this was a VRAM address, it caused the VRAM to think it was being accessed. I hacked the address decoder with an extra wire and code changes to suppress the VRAM memory select signal in such cases, but it feels like an imperfect solution.
Even with that fix in place, I found there were frequently transient values on the address bus that caused VRAM to think it was being accessed during access to some other memory location. I altered a memory control signal to suppress all memory select signals during the first quarter of the clock period, which resolves the problem, but with a heavy cost. By delaying the memory access signal, it reduces the maximum theoretical clock speed of BMOW by 25%. But right now, I’d rather have it run right than run fast.
I gave up at that point. With the new code to wait for VBLANK, there’s no snow anymore, but the loss of VSYNC problem seems to be happening much more often. To add insult to injury, I also discovered a new problem where VRAM access during the VBLANK period interferes with retrieval of the video mode bytes, which are stored in a non-visible portion of VRAM. Every so often, my screen of text will suddenly appear as a screen of garbage video instead, as the display hardware interprets the contents of VRAM as an image rather than text. I’m not immediately sure how to resolve that one either.
I should probably be grateful for all these problems. If there were no more problems, there would be nothing to do, and I’d have to start a new project!
Read 2 comments and join the conversation2 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
Ah, bugs bugs. This is quite an impressive project. As for what I finally have started on is a SRAM bank. At my work’s old location I found quite a few old motherboards which happened to have 15 and 25 ns sram on them in PDIP-28s. So I ripped about 10 or 12 of the 32k off.
Going to have a bank of 256K, hopefully enough for some sort of ram to be used as a plug-in board for something :-/
Don’t know what kind of CPU to use with it though. Have a suggestion? I was thinking of somehow throwing it next to an fpga for some vram
Egad!!! Bugs!
I remember writing programs on that original family. The first few were very buggy, then they got better.
Keep at it, you’re doing fine.