Floppy Write Emulation, Continued
My apologies for another post full of abstract floppy emulation thoughts, with nothing concrete to discuss and no photos to show. As readers have no doubt noticed, I’m having a difficult time wrapping my head around the best way to approach this problem. Fortunately, it’s slowly becoming clearer how to proceed.
Current Prototype
First, a review of the current prototype that I’ve built: the emulator performs SD loads and saves on demand, at the instant they’re required by the Mac. There is minimal RAM buffering. This “on the fly” access method works very well for floppy read emulation, using any type of SD card, and could be used to make a nice read-only floppy emulator. Sadly, it doesn’t work as well for write emulation. With a high speed (class 10) SD card it works for sector-by-sector write emulation, but it fails with slower SD cards, or with any speed card when doing whole-track writes or floppy initialization.
The Problem With Writes
The fundamental problem with writes is that data comes in from the Mac faster than it can be saved to the SD card. The only solutions are to save to the SD card faster, or slow down the rate of data transmission from the Mac. I’ve recently spent a substantial amount of time searching for a way to slow down the incoming Mac data, and I’ve concluded that it’s just not possible. The Mac blindy blasts out sectors to be written. There is no feedback signal, no flow control, no ready flag, and no error mechanism that can be exploited to slow down the incoming data without causing the write operation to fail. The only remaining path is to somehow speed up SD saves, so the emulator can keep up.
Using a different or faster microcontroller wouldn’t help much. Yes, it would reduce the amount of time needed to send the data to the SD card during a save, but that’s only a small part of the total save time. The bulk of the time is spent waiting for SD card internal operations to complete (Flash page erase, etc), and that’s independent of what microcontroller is used. Using a faster SD card will help, but even a class 10 card (the fastest class) struggles to keep up.
RAM Buffering
After a lot of thought, I’ve decided to switch back to the ATMEGA1284 microcontroller that I’d originally planned to use, which provides 16K of internal RAM that can be used for buffering. That’s enough RAM to buffer both sides of an entire track, with 4K space remaining for other uses. While not a panacea, the additional RAM will help in three ways:
Reads from RAM – Once all the sectors for a track have been read into RAM, the emulator can continuously stream them to the Mac in an endless loop, with no further SD access necessary. This frees 100% of the SD bandwidth available for writes, in contrast to the “on the fly” method which is constantly loading data from the SD card.
Multi-block SD Transfers – When modified sectors in RAM must be saved to the SD card, the save can be performed using a multi-block SD save, which is substantially more efficient than doing many individual single-block SD saves.
Data Rate Smoothing – A large buffer can help smooth brief fluctuations in the incoming data rate, or the SD save speed. For example, if the SD card is capable of saving 50 sectors per second, but the incoming data rate briefly shoots up to 100 sectors per second, a system with no buffering will fail. But assuming the buffer is large relative to the duration of the data rate spike, the buffered system will continue to work. The same is true for brief spikes in the time per sector needed for SD saves. The buffer keeps the system working as long as the average incoming data rate is less than the average save rate, instead of requiring the fastest instantaneous incoming data rate to be less than the slowest instantaneous save rate.
Even with these advantages, I suspect there will still be some SD cards that aren’t fast enough for the fastest floppy write operations (floppy initialization). The slower class 4 PNY card I’ve been experimenting with probably won’t support initialization, because even using multi-block saves, it takes about 375 ms to save a track, but the Mac writes a new track roughly every 350 ms. This is a case where the average incoming data rate is greater than the average save rate, and there’s nothing more that can be done to help it. I expect this card will still work for disk copy operations and normal sector-by-sector write operations, however.
The Other Shoe Drops
OK then, just add more RAM buffering and then everything will be great, right? Well, no. In order to take advantage of the RAM buffers while still meeting the floppy timing requirements, the emulator firmware will need to be considerably more complex. The current prototype is a single-tasking microcontroller program: at any given time, it can be doing an SD card transfer or a Macintosh sector read/write transfer, but not both. The program will need to be enhanced so that both interfaces can be active at the same time, by handling one of them entirely in an interrupt routine.
The strategies used to decide when to load and save data from SD must also be more complex. For example, after the Mac sends a sector to be written to the emulator, should it be saved to the SD card right away? That would provide the biggest head start. Or should it wait a while, and see if more sectors are received, which could then be batched together into a multi-block SD save? Another example, when the Mac steps to a new track, should the emulator first save dirty sectors from the previous track, or should it first load the sectors from the new track, or perhaps interleave the two operations? A third example, while first loading the sectors for a new track, the Mac may begin a write operation, and may attempt to write the very sector that is currently being loaded from the SD card. How should that be addressed? There’s a lot to think about, and it’s not clear what the right answers to these questions are.
Sanity Check
To make sure I’m not doing something ridiculously stupid, I looked at two other hardware floppy emulator projects that seem most similar to this one.
The HxC Floppy Emaultor is the closest relative to Floppy Emu. It uses a PIC, 32K SRAM, and an SD card to emulate floppy read/write operations for a variety of computers using Shugart style floppy interfaces. At a high level, HxC appears to use the same emulation method that I’ve proposed here. I don’t know if it supports high-speed write operations (initialization) on slow speed SD cards. The documentation doesn’t make any mention of a minimum required SD card speed, and the author hasn’t yet responded to my question about it in the forums. He did answer many of my general high-level questions, though, which was nice. It appears he’s quite concerned about other people cloning the HxC, and is reluctant to give out too much detailed implementation information. The firmware and hardware of HxC is a closed, proprietary system.
Semi-Virtual Diskette (SVD) is a simpler design, using a PIC and 256K of SRAM to store the disk image. There is no persistant storage, and disk images are loaded and saved to a host PC using a serial cable. As with the HxC emulator, computers using Shugart type floppy interfaces are supported. The 256K RAM size limits the possible floppy types to low-capacity ones. All the hardware schematics and firmware source files are available to download. The SVD project appears to be inactive– the author didn’t respond to my email, and the web site hasn’t been updated in several years.
There are also a couple of commercial floppy emulators, but they lack any real implementation info. At any rate, it appears that I’m headed down a reasonable path by introducing track-sized RAM buffering, even if it will introduce a whole mess of new complications to the emulator software. The next step is to build a real hardware prototype with an ATMEGA1284, and start working on the software revisions. Woohoo!
Read 15 comments and join the conversation
15 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
I find it odd that performance of your SD cards is so dismal. Even a cheap Class 4 card should be guaranteed to write at least 4MB/s average, and this is usually borne out (and often exceeded x2 or x3) when used with a PC and card reader. Such a card shouldn’t have much trouble writing the entire 800K image in 350ms, let alone a single track.
There are obviously complications here with write patterns and the wear levelling (or whatever it is) that is causing inconsistent write timing, but I think there must be something in your SD card access methods that is slowing it down substantially. With buffered track writes I think it should be possible to make your original approach work fine, though perhaps too much fiddling with timing problems would be involved.
Anyway, elm-chan’s FAT library seems to be able to get to > 300K/s with a 10MHz clock on ATmega, average with 2K writes. I am not sure if this is better than you’re getting out of sdfatlib, but it seems like it should be good enough for your buffered-track approach, if there aren’t any timing issues (but writing 24 blocks I think the average should dominate…) – and you’re running double the clock. http://elm-chan.org/fsw/ff/00index_e.html
tl;dr it seems like your SD performance is too bad to be limited by the card, I think investigation of improving performance there is worthwhile.
Have you considered cheating?
Eg – a couple of cables with alligator clips should let you directly halt the 68k on the Mac when you need it to slow down.
It does seem surprisingly slow, but it roughly fits with the numbers that SdFatLib’s author gave me. I also get similar numbers with a simple test app that just writes sample data to the card, without any floppy emulation stuff.
I think the 4MB/s rating is for the native SD interface, and probably for “large” transfers like 1MB+. My guess is that for small transfers, the one-time setup costs of the write dominate all else. Even a full-track write is still only 12K of data and so is very small by any modern yardstick.
I’ll try elm-chan’s lib and see if it makes a difference– thanks for the link.
For comparison, I’m seeing the equivalent of ~32K/s write performance with the class 4 card, and 300K/s with the class 10 card, using multi-block writes of 12K.
Looking at the elm-chan SD performance data with AVR, 2K writes are 309K/s with one card, 88K/s with another, and 27K/s with a third. So that’s almost exactly the same range I’m seeing with the two cards I’ve tested with.
Have you considered running multiple SD cards simultaneously? It would be very impractical, but also very cool.
is there any way to pre-erase sectors on the card? for example writing all 0xff or something, or some other command so that the sd card knows that the sector you are writing to is blank and doesnt have to erase it at the time of writing?
@Steve: Interesting that performance is quite so bad for small writes. Too bad. Guess that means bigger buffers 🙁
I think multiple SD cards would be a little impractical. If I can’t make it work reliably with any SD card, I’d rather just put a minimum requirement for the card speed.
The timing numbers I mentioned are for pre-erased blocks. At the start of the multi-block transfer, you tell the SD card how many blocks to pre-erase, which is 24 in my case (24 * 512B = 12K).
How large must the floppy be? Seems like it would be easy to use a few MB to buffer the entire disk, or am I missing something?
It looks like one reason that HxC may not have such difficulties with write performance is that the floppy drive it emulates uses an index hole to mark the start of a track. So when the emulator is busy saving the data from a previous track, it can withhold the virtual index pulse for the new track by up to 1 second, effectively pausing the incoming data from the computer. The Mac floppy drive doesn’t use the index hole and so write operations can’t be paused that way.
@William – A 1MB external RAM would do it, but I dislike that solution for several reasons, discussed in the the previous posts. Maybe I’ll take one more look at it, though…
How about abandoning the SD card all together and using a USB 2.0 thumb drive?
Most people are going to have greater access to thumb drives anyway. I’m interested in this device, it would be nice to be able to use my existing supply of thumb drives 🙂
I haven’t a clue, but could two micro-controllers be used? So one could handle the communications with the Macintosh and primary buffer, the other solely for dealing with SD reads and writes and secondary buffer?
Is the 375ms/track number when going through SdFatLib? How much FAT housekeeping does it do during those operations?
Can you exploit multi-block transfers by always preallocating an entire disk or track’s worth of space and then aborting or filling with 0xFF when necessary?
You might have to forgo SdFatLib or write everything into a single large file and then reconstruct or write metadata in the dead time.
Regards,
@ndy
There’s no FAT overhead– FAT is only used to locate the disk image file on the card, but thereafter everything is done using raw SD block reads and writes. My newer prototype has enough buffer space that a whole side of a track can be written at once using a multi-block write, which should help. I just need to get the board made now!