Fixing 30 Year Old Apple ROM Bugs
After nearly a year of inactivity, I’ve started work on Floppy Emu again! One of my first priorities was compatibility with Macs that have a 400K floppy drive – the original Mac 128K, and the Mac 512K (not the 512Ke). Floppy Emu emulates a 400K/800K external floppy drive, and it works fine with 400K disk images, so I originally assumed it would have no problems on those old 400K-based machines. Wrong! Reports trickled in of mysterious Sad Mac errors and other problems when using Floppy Emu with those oldest Mac models. After ignoring the problem for months, I finally got ahold of a Mac 512K so I could investigate things firsthand.
Some brief experimentation showed that Floppy Emu was at least partly working with the 512K. When I “inserted” a disk image of a non-bootable disk, the Mac rejected it and showed the X’d disk icon. But when I inserted a bootable 400K system disk image, the Mac chewed away for a moment, then died with a Sad Mac error code 0F0004. So it was clear the Mac 512K could recognize the difference between a bootable and a non-bootable disk, but was failing to actually boot when using Floppy Emu. The same disk image and Emu hardware booted fine on a Mac plus, so the problem looked like an unknown incompatibility between the Mac 512K and Floppy Emu.
The Sad Mac – such a cute way for a computer to die. Much friendlier than a blue screen of death, but just as fatal.
From past experience, I knew the Mac 128K and 512K used a different version of the Apple ROM than found in the 512Ke and Mac Plus. The 512Ke/Plus ROM added support for 800K floppy drives. But as long as only 400K disk images are used, I couldn’t see any reason Floppy Emu shouldn’t work on 128K/512K Macs with the old ROMs. After all, how would the Mac even know that Floppy Emu wasn’t a 400K drive? The real 400K and 800K drives are virtually identical, with the same connector, same internal registers, etc. The only difference is that one is a single-sided drive and one is double-sided. Also the Mac directly controls the speed of a 400K drive with a PWM signal, but an 800K drive ignores the PWM signal and self-regulates its speed.
I hunted the internet for details on 30-year-old boot errors, and found two explanations for error 0F0004. One said “Voltage too Low, adjust voltage to +5.0v.” and another said “Division by Zero”. How could there be two such radically different meanings for the same error? But things started to fall in place after I found this Apple Tech Note, which said that 0F0004 was a result of using an 800K external disk drive on the Mac 128K/512K with the old ROMs. So somehow the Mac was still identifying Floppy Emu as an 800K disk drive, which caused it to die. But how did it know?
ROM Diving
When all else fails, it’s time to look at the source code. In this case that meant disassembling the ROM from the 128K/512K to find out what the floppy driver is doing. I’ve done this a few times before now, but it’s still a major pain. Even with a 68K disassembly tool, and substituting symbolic names for all the Mac memory-mapped hardware, it’s still an opaque mess of assembly language code that doesn’t yield its secrets easily. It’s hard enough just to locate the relevant floppy routines, let alone understand the fine details of how they work. But after a day of poking and prodding, I found some code that looked very suspicious:
P_Sony_MakeSpdTbl: 1E82 285F Move.L (A7)+, A4 1E84 343C 0080 Move $80, D2 ; set PWM value to $80 1E88 615C Bsr P50 ; measure TACH speed, get speed1 result in D4 1E8A 6B56 BMI L309 1E8C 2604 Move.L D4, D3 ; copy result to D3 1E8E 343C 0100 Move $100, D2 ; set PWM value to $100 1E92 6152 Bsr P50 ; measure TACH speed, get speed2 result in D4 1E94 6B4C BMI L309 1E96 2A04 Move.L D4, D5 ; copy result to D5 1E98 9A83 Sub.L D3, D5 ; D5 = difference between speed1 and speed2 1E9A E38B LsL.L $1, D3 1E9C 7C04 MoveQ.L $4, D6 1E9E 4BFA FFC8 Lea.L DT19, A5 1EA2 6100 FCA2 Bsr Sony_SetupSonyVars 1EA6 47F1 101A Lea.L $1A(A1,D1.W), A3 1EAA 7400 L304: MoveQ.L $0, D2 1EAC 341D Move (A5)+, D2 1EAE 2E02 Move.L D2, D7 1EB0 D45D Add (A5)+, D2 1EB2 E24A LsR $1, D2 1EB4 D484 L305: Add.L D4, D2 1EB6 9483 Sub.L D3, D2 1EB8 6A02 BPL L306 1EBA 7400 MoveQ.L $0, D2 1EBC EF8A L306: LsL.L $7, D2 1EBE 6702 BEQ L307 1EC0 84C5 DivU D5, D2 ; divide D2 by (speed2 - speed1)
Comments were written by me, after analyzing the code. This particular routine does some kind of calibration of the floppy drive – it varies the PWM signal, then measures the resulting drive speed as indicated by a value called TACH. I think it’s trying to establish a linear relationship between PWM and TACH, since that relationship may vary slightly between real 400K drives. There’s a lot going on in this routine, and I’ve truncated it to only show the first 25 instructions. But notice it contains a DivU instruction? There aren’t many places that division is used in the original Mac ROM, so that’s significant.
Looking deeper, the routine makes two drive speed measurements, then does some math to compute a value in D2, then finally divides D2 by the difference between the two speed measurements. But what happens if the two speed measurements were equal? Division by zero! Hello, 30 year old ROM bug.
On a 400K drive that’s controlled by the Mac’s PWM signal, the speed measurements will always have different results, because the PWM is different during each measurement. But on an 800K drive which self-regulates its speed, and on Floppy Emu which has a totally fake speed, the PWM changes will have no effect. That means both speed measurements will get the same result, and the Mac will crash with a division by zero error when it calls this ROM routine. Getting two different speed measurements was probably a safe assumption in 1983/1984 when the code was written, but it still would have been nice to do some defensive programming and add a zero check there, to handle the case of a broken drive or broken assumptions.
Fixing It
Once I understood the cause of the 0F0004 error, the question was how to modify Floppy Emu to avoid it. The TACH speed signal that Floppy Emu generates is obviously fake, since there are no moving parts. It calculates how fast the drive motor should be spinning, given which track is being accessed, and creates a series of pulses on TACH at the appropriate rate. To avoid the division by zero crash, the TACH rate needs to vary, so that two successive measurements see different TACH speeds.
One solution would be to use the PWM signal from the Mac, since that’s its purpose. By analyzing the PWM duty cycle, the Floppy Emu hardware could infer how fast the Mac wanted the drive to spin, and generate an appropriate TACH to match. Unfortunately, the hardware doesn’t even have the PWM pin connected. And if it did, it’s not certain that it could do the necessary duty cycle and TACH calculations fast enough, or efficiently enough to fit in the remaining logic space.
My solution was to constantly flutter the drive speed TACH signal. The flutter rate must be fast enough that two successive measurements will see different rates, but not so fast that two successive measurements will span the entire flutter cycle and so see the same rate. The flutter amplitude must be large enough for the speed measurements to be different, but not so large that the measured speed falls outside the valid range for the current track being accessed. With a little experimenting, I settled on a flutter cycle period of 640 ms and a flutter amplitude of about 0.25%.
And it works! The image above shows the Mac 512K running System 0.97, Finder 1.0, booted from Floppy Emu. Those fonts sure are weird.
A Bit of History
When Macintosh external 800K floppy drives first became available, in 1985/1986, owners of the Mac 128K and 512K faced the same problem I did here, only they couldn’t modify the drive’s TACH behavior to work around the ROM bug. Instead, Apple released a system patch called HD20 which fixed the bug and added 800K drive support. But using it was a pain: you had to boot from a 400K floppy in the internal drive first, which contained the HD20 patch, and then you could mount an 800K floppy in the external drive. Booting from an 800K drive wasn’t possible. It wasn’t a very nice solution.
If that ROM routine’s author had added a zero check, this wouldn’t have been necessary. Mac 128K/512K owners could have booted directly from an 800K floppy in the external drive, loading the HD20 init in the process. Everything would have been great. Instead, that divide by zero bug doomed them all to a miserable 800K experience.
When Apple and Sony were developing the 800K external drive, they must have known this was a problem, and they could have used the solution I did to flutter the TACH speed. In 1985 they couldn’t just drop a 25-cent microcontroller into the drive to synthesize TACH, but they could have added a simple RC circuit to inject some AC “noise” into the TACH signal at the appropriate amplitude and period, achieving the same result. Everything would have been great. But they didn’t, and all those 128K/512K owners were forced to endure the 400K floppy boot-swap dance forever.
Read 5 comments and join the conversation5 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
Macaeolgy:
The systematic study of past computer behaviour by the recovery and examination of remaining material evidence
🙂 🙂
Just dug this out from Usenet:
https://groups.google.com/d/msg/net.micro.mac/I2vr4wG1sEE/4SBnQXtf_dwJ
Apple likely somehow fixed this problem with the Apple 3.5″ drive daisy chain board. I used to run one of the drives off of my IIgs on a Mac 512k since the 400k drives were seized. I have no problem booting off of it without the HD20 init.
Yes, I’ve heard that’s true, but I can’t explain it. I mostly reverse-engineered the Apple 3.5 Drive’s daisy chain board a few months ago, and I didn’t see anything that could explain why it doesn’t cause the same division by zero error when using an 800K drive.
There is a Macintosh 512k I am looking at, and in the pictures it looks like the seller has a 800k drive plugged into the Mac. Do you think thats whats causing the 0F0004 error on it?