Floppy vs SD Card Write Speeds
Which would you guess supports faster write speeds: a modern high-speed class 10 SDHC Flash memory card, or a vintage 1984 floppy disk employing a mechanical stepper motor to move its drive head, spinning magnetic media at a lazy 400 RPM? For large writes of contiguous blocks, the SD card’s speed blows the floppy away. But for random single block I/O, or sequences involving many alternating reads and writes, the SD card struggles to match the performance of the lowly floppy. That’s not a great thing to discover when you’re halfway through designing a floppy disk emulator.
Let’s put some numbers on performance. The floppy can read or write a 512 byte sector in 12 ms, with no variability: the speed is always one sector per 12 ms. An 8GB PNY class 10 microSDHC card in SPI mode can read a 512 byte block in about 2-3 ms, with a 4 MHz SPI clock. The same card exhibits single block write times of typically 5-9 ms, but with occasional spikes up to 70+ms for a single block. Write times appear to be inherent in the card, and mostly unrelated to the SPI clock rate. So while the average write speed of the SD card is somewhat faster than a floppy, the speed is variable, and the worst case is slower than a floppy.
Class 10 SDHC Emulator Results
The good news is that the class 10 SDHC card is fast enough to support emulation of normal floppy writes, in which some number of sectors on an existing floppy are updated. I’ve been able to copy large files around on the emulated floppy disk reliably, using the new class 10 card. This type of write actually follows a read-write-read-write pattern, as the Mac alternately reads to find the address section of the desired sector, then writes to replace the sector’s data section. Following each write, the emulator takes 5-9 ms to write the data to the SD card, while supplying sync bytes to the Mac. The Mac sees this as an extra-large intersector gap while attempting to read the next address section. It will tolerate gaps of up to roughly 23 ms, although this will make writing files noticeably slower than a real floppy.
The bad news is that the class 10 SDHC card is not fast enough to support emulation of continuous floppy writes, such as those during initialization of a floppy, or when doing a full-disk write with a disk copy program. This type of write is just a constant stream of incoming bytes, at a rate of one sector per 12 ms. The emulator cannot stall after the first sector to peform an SD write, because the second sector is already inbound. To address this I implemented a double-buffered system, which uses an interrupt routine to read the next sector’s data into a new buffer, even while the data from the old buffer is being written to the SD card. Unfortunately, the overhead of the interrupt routine increases the SD write time to 12+ ms, so the emulator simply can’t keep up with the incoming data. Using more than two buffers might help, if there were enough RAM for them, but the average SD write time would still need to be under 12 ms. Buffering helps recover from occasional “burps” where a write takes longer than 12 ms, but it can’t improve the overall write speed.
Incidentally, while studying continuous write behavior, I discovered that sectors in a Macintosh floppy track are interleaved like 0 6 1 7 2 8 3 9 4 10 5 11, rather than being appearing in consecutive order by sector number.
Arrgh!!
This whole business of emulator write support is turning into quite a pain, causing the fun value of the project to drop steeply. To make matters worse, I somehow managed to brick the class 10 card while I was experimenting with raw SD block operations, and I don’t have any device that will reformat it. As much as write support is an essential part of floppy emulation, I’m questioning how much more time it makes sense for me to sink into it. I’m therefore tempted to eliminate write emulation entirely, release a design for a read-only floppy emulator, and leave it at that.
My brain is struggling with the details of card performance in single-block and multi-block write modes, various buffering schemes, and timeout values for Macintosh I/O operations. My gut tells me there must be some clever way to use buffering and/or multi-block writes to get reliable write performance, even with a slow SD card, but so far I haven’t found a solution. And regardless of any amount of buffering or other clever schemes, I believe that if any single block write takes more than about 12 ms (the max track step time) + 23 ms (the max intersector delay), then emulation could fail. The computer might do a write immediately followed by a track step and then a read, which couldn’t be serviced until the write finished.
I looked again at the HxC Floppy Emulator, which uses a PIC and a 32K SRAM to emulate floppies for other classic computers. The author has been kind to answer many of my questions about its inner workings, but I don’t know the buffering strategy it uses, or whether it’s subject to failure in the same worst cases as my design.
Write Options
Some other possibilities, short of eliminating write support completely:
Experimental Write Support – I could leave the write emulation as it is now, and call write support “experimental”. It would be a crap shoot whether it worked or not, depending on the card that was used. I think normal writes from Finder file operations would work on most class 10 cards, but continuous writes (disk initialization and disk copying) wouldn’t. Maybe that’s acceptable.
Strict Card Requirements – The author of SdFatLib ran one of my performance tests, and got substantially better write performance using two different models of SanDisk Extreme cards than I saw on my class 10 PNY card. Unfortunately my local store didn’t have any high speed SanDisk microSDHC cards. If those cards work reliably, I could make them “required” for write support, but I’m uncomfortable with that idea. Even if it worked, I wouldn’t be any closer to understanding why some cards work and some don’t. I’d also be faced with the task of continuously testing new cards as the old SanDisk ones were obsoleted and replaced with different models.
Multi-Sector Writes – Instead of single block writes, I could use the SD multi-block write method. Using this method, you tell the card “I’m writing N blocks beginning at location L”, and it pre-erases all the blocks, then writes them quickly as they arrive. This makes individual block writes much faster, but requires the pre-erase step, and also requires knowing how many blocks you’re going to write before you write the first one. That’s not possible when writing blocks as they arrive from the Macintosh, since it’s never known when the floppy write will end. If many sectors were buffered in RAM first, then they could be written in a multi-block write, but the length of the multi-block operation would present its own challenges. What would happen if during the long multi-block write, the Mac decided to step to a different track and begin reading new sectors?
A related method I’ve yet to try is to erase the SD block as soon as the data for it begins to arrive from the Mac, instead of waiting until the entire sector is received from the Mac before doing an SD erase and write. I’m not even sure that’s possible, but it seems like it would help.
Code Optimization – I believe that continuous writes with the class 10 PNY card are falling just short of the necessary average speed. If I could optimize the interrupt routine to reduce its overhead, it might work. Without substantial buffering, however, continuous writes would still fail whenever there was a single anomalous slow write, even if the average write speed were fast enough.
More RAM Buffering – By using an AVR with more internal RAM, I could buffer more sectors during writes. That feels like it should help somehow, but I’m not certain it actually would. With my current code, normal writes don’t use buffering. The class 10 card doesn’t need buffering, since the SD write is performed during the intersector gap before the next read. The class 4 card I tested earlier had such strange latency patterns (12 consecutive writes of 50-80 ms) that no amount of buffering would help it. In fact, buffering would provide no benefit, because a second write from the Mac cannot begin until the first write to the SD card finishes, the next sector is read from the card, and the Mac reads that sector’s address section.
Additional buffering would help somewhat with continuous writes, if they can be optimized enough so that their average write time is fast enough. A large buffer could also be used to read a full track into RAM at once, then play the tracks from RAM instead of continuously reading them from the card. That would enable SD writes to happen without blocking SD reads of sectors in the same track. However, a similar blocking problem would still occur if the Mac stepped to a different track and began to read sectors there, while a long SD write operation was monopolizing the card.
Buffer the Whole Disk – The extreme of buffering is to use 800K of external SRAM to buffer the entire disk. Maybe that’s a sensible idea, and it would certainly work, but I’m very reluctant to do it. Aside from the additional pins needed and the cost of the parts, it just feels wrong. HxC is proof that floppy emulation should be possible without a full disk buffer.
Whew!
Documenting the possible options here has been an exercise in organizing my own thoughts, more than an attempt to explain them to others, so I hope it’s comprehensible. It’s starting to feel a bit like I’m launching into a graduate thesis project! That’s not a good sign, and I’m concerned I’ve already spent more time experimenting with write support than makes sense. Once I (hopefully) unbrick my class 10 SD card, I’ll try a few more experiments to see if I can improve write performance further. But after that, I think I’m going to return to the hardware design, and plan to use an ATMEGA1284P with 16K of internal RAM. Any further improvements to write emulation will then have to be done entirely in firmware, within the limitations of that hardware.
Read 13 comments and join the conversation13 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
This: https://www.sdcard.org/downloads/formatter_3/ could help to unbrick SD
How Mac manages write errors ? It is impossible that ther is no way to simualte such situation but if it could be possible then you can use it to have more time for writing.
I’ve been using my printer to mount the SD card as a network drive, but you can’t format a network driver or do block level I/O with it. I need to get a USB card reader so I can reformat the card, then I think it will be OK.
Interesting thought about write errors. I’m not sure, but I think it depends on the type of error. For the continuous writes where I’m having the most problems, I think it writes the whole track, then reads it back to verify it, and aborts if the verify fails. But I haven’t looked at it extensively.
+1 for buffering the whole disk. As Ken Thompson said, “When in doubt, use brute force.” You could use an LED to indicate when the SD card data is in sync with the buffer.
You could try to add more RAM via SPI. With the Microchip 23K256 you get 256Kbit SDRAM that can be accessed via SPI at 20MHz clock speed max. They cost about USD 1.50 in onesies and you might even get a sample for free.
Another solution is to use an AVR with DMA, which you can use to add as much external RAM as you like.
Codes: http://support.apple.com/kb/HT1618
suggests that errors -74 or -79 could be the right way.
You could easily test if returning such error code after too long write persuades the computer to retry.
I think that you should try using two SD cards. When no read or write commands are coming from the Mac, the two cards compare data and write any differing data to each other, so they both have the same data. When the Mac asks for data, one card could send the data to the Mac, while the other does nothing. When the Mac writes data, the data can also be written to the first card, but when the buffer is about to overflow, the data in the buffer can be written to the second card while the first is busy writing a block. Likewise, read and write commands in quick succession can be handled between the cards, and the cards can sync when no commands are being sent from the Mac. This design may solve the errors on large writes, when the buffer overflows because of those seemingly random long block writes. This design would also allow for the implementation of “RAID 0” style performance boosts and for the implementation of wear-leveling. I remember reading articles about using IDE to CF card adapters and Windows 98. The writer strongly suggested disabling the paging file, as it would shortly destroy the card from frequent writing. I would imagine that solid state storage for operating systems written prior to the SSD revolution would also benefit from wear-leveling.
I am in awe of this project and I love your website! Your projects are amazing to me- I can’t do much more than make LEDs light up. I wish you the best of luck in this and all of your projects.
Unfortunately I can’t choose to return a specific error or anything like that– the floppy driver determines if there’s an error, based on the behavior of the (emulated) floppy drive. I think error -74 (write underrun) is an API level error and not a true disk error, but error -79 (disk speed) might be exploitable. Can I gain some extra time by temporarily changing my emulated disk speed? Maybe… but if so, probably only when stepping tracks.
Lawrence, thanks for the kind words. A friend and I were just joking today about using two SD cards in a “RAID array”. I really think it should be doable with a single card, though, if I can only figure out how.
I really would take a look at using the Propellor chip. You get 32K to use as a buffer, and eight 32bit processors.
You could substantially reduce chips, cost, and complexity, and with the buffer and independent processors you should be able to cope with the write problem.
Another solution is to eliminate random writes alltogether.
http://en.wikipedia.org/wiki/Log-structured_file_system
Basically instead of updating a sector “inplace“ you append an updated sector data and some metadata to the log file. When a request for sector X comes you scan metadata to identify the most recent log record containing X data. I guess an offset table won’t take too much RAM hence it would be necessary to scan the log only once. And I believe that it is not even necessary to “vacuum” the log since an average SD card is by orders of magnitude larger than good old fashioned floppy.
There must be the way to inform Mac OS that something went wrong during writing sector.
Using SD cards in RAID configurations, logged filesystem or processors with more memory than the original computer had to write single 256 bytes long sector is as using cannon to kill the fly.
I recommend buffering up to a whole track at a time. Wait for all data written back to the SD card whenever the BIOS instructs the floppy controller to seek to another track. Hopefully the BIOS will allow you to work much more time after a seek (between tracks) than between sectors.
If that doesn’t work, patch the BIOS so it allows you to wait more between writes.
I think the answer to this could be Raspberry Pi. You get 256Mbytes RAM, enough to buffer the entire disk plus a lot more! and you could leave all the read-writes to non-volatile memory e.g. USB stick – till the floppy is ‘ejected’ or ‘inserted’. You also get a full Linux system to do file handling etc.
I’d like to see the latency numbers of RPi vs the floppy drive 😀