Implementing Floppy Emu Writes
I’ve yet to implement write support for my SD card Macintosh floppy emulator, but my rough plan was:
- Perform GCR decoding on the fly, and store decoded sectors in RAM
- When the Macintosh steps to the next floppy track, use the delay to flush the sector data to the SD card
- Need to buffer as much as both sides of a 12-sector track, for 24 total sectors, or 12288 bytes
- Use a microcontroller with at least 12K RAM for buffering
Too Slow
For this to work, the microcontroller and the SD card must be fast enough to write 12288 bytes during the track step time. On a real floppy drive, the step time is about 4 ms, but in my tests it can be as long as 12 ms before the Macintosh aborts with an error. 12288 bytes in 12 ms is 8192000 bits/second, so the SPI clock used for SD card communication must be at least 8.192 MHz. On the ATMEGA series, the SPI clock can be at most half the CPU clock speed, so the minimum CPU clock speed would appear to be 16.384 MHz.
But wait, it’s worse than that, because SPI communication isn’t 100% efficient. There’s a delay between each byte, while the microcontroller checks to see if it’s time to send a new byte, and then queues it up. There’s also a delay between each 512 byte block. And if the disk image file being written doesn’t occupy consecutive blocks on the SD card, there will be additional delays between each block, as the SdFatLib code uses the FAT info to locate each block of the file. My rough guesstimate is that actual performance would be 50% to 100% slower than predicted by the SPI clock speed, and that matches the numbers I’ve seen from other people using SdFatLib. To compensate, the CPU clock would need to be 50-100% faster, around 24 MHz to 32 MHz.
A further complication is that the entire 12 ms step window can’t be used for SD writes. Some of that time is needed to update the LCD display, and other housekeeping tasks needed when stepping tracks. To compensate, the CPU clock would need to be still higher.
In short, this approach to writes is simply not going to work on an ATMEGA microcontroller with a maximum clock speed of 20 MHz.
Go Faster
One solution would be to use a different microcontroller that supports higher clock speeds, like an ARM series mcu that was suggested by commenters in the previous post. That would probably work, although I’m reluctant to do it, since it would entail redoing much of the design, learning the details of a new architecture, porting the code to it, and getting programming hardware for the new mcu.
I’m also uncertain how fast the SD card can actually go over SPI, and I haven’t found any definitive answer. The number 25 Mbps appears in a few places, but I think that’s using the multi-bit native SD interface rather than the 1-bit SPI interface. Regardless of the SD card’s capabilities, if I push the SPI clock speed higher, I’ll need to design a circuit board that works well at high clock speeds, which means paying attention to all the board layout details I don’t fully understand and normally ignore. I think I should probably be okay at speeds of 10-20 MHz, but I’m really not sure.
Background Writes
A more complex solution that doesn’t rely on increased clock speeds is to perform SD card writes in the background, while data is being transferred from the Macintosh, instead of trying to squeeze the SD writes into the track step interval. This was suggested by a commenter in an earlier post, and while it would be trickier to implement, it has many advantages.
This method would only require two sector buffers, for 1024 total bytes of RAM. As the Mac sent the data for the first sector to be written, the microcontroller would decode it and store it in RAM buffer 0. After the last byte of the sector’s data was received, the mcu would immediately call an SdFatLib function to write the sector data to the SD card, but it would also install an interrupt handler to be invoked when bytes were received for the next sector. The interrupt handler would store these in RAM buffer 1. The SD write of the first sector would complete well before the last byte of the second sector was received. SdFatLib would then be called to write buffer 1 to the SD card, while the next sector’s data was being stored in buffer 0 by the interrupt routine. In this way, the mcu would always be writing one buffer to the SD card while the interrupt routine filled the other buffer with data for the next sector.
This approach is appealing because it doesn’t require especially fast clock speeds, nor does it require a microcontroller with a large amount of RAM. In fact, it would probably work with the ATMEGA32u4 I’ve been using for breadboard prototyping, which has just 2K of RAM.
In order for this method to work, there must be sufficient time between each “new byte” interrupt to do the following:
- store the processor state and invoke the interrupt handler
- perform GCR decoding on the byte
- store the byte in the RAM buffer
- return to the main program, which is executing an SdFatLib write
- make sufficient progress on the SdFatLib write before the next interrupt so that the write finishes before the last byte of the next sector is received
The interrupt rate is fixed by the Macintosh’s floppy data rate, so the interrupt will be invoked every 16 microseconds. Depending on the ATMEGA’s clock speed, that’s enough for 128 to 320 clock cycles between interrupts. Is that enough to accomplish all of the above? Probably, but it might be a little tight.
Read 8 comments and join the conversation8 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
If worst comes to worst, assembler will work.
However, doing the following:
a) Non-interrupt code that sets up the “put” pointer for the buffer, sets the absolute address (i.e. volatile char * volatile putptr)
b) Interrupt handler:
read byte, write to absolute address (*(putptr++)=new_byte)
if (bytecount++)>0x1ff OVERFLOW
The register save/restore code should take more cycles than the actual interrupt code itself (unless the code to read the incoming byte is complex)
If you compile the code, and get a working code, you can take the disassembly of the routine, strip out any unnecessary register saves/restores (the interrupt support code) – doing so, the entire interrupt routine should take less than 40 cycles (including interrupt setup and return).
That leaves at least 3/4 of the MCU for the foreground task – which should be more than ample.
I know you are reluctant to switch to an ARM micro, however, some of them have proper SD support built in so you could use 4 bit addressing. NXP only have last gen ARM micros with SD/MMC support however ST have some newer micros with support, such as the STM32F103RC. ARM microcontrollers also support USB host, the LPC1754 for example, with this you could just use an off the shelf flash drive as storage.
Is there any easy way to see the disassembly of compiled code from AVR-GCC?
The AVR objdump can disassemble the ELF file that is in turn converted to a hex file, AFAIK you cant just get a disassembly though, your stuck with locations and the hex values alongside the disassembly.
gcc -S foo.c will create foo.s, which will have the unassembled source.
Awesome, thanks!
Well, What about just using XRAM or something of the like alone, something that will store all tracks. Do they make a serial RAM? might be too slow though. Or use an ATmega128 that has enough port pins to use XRAM and paging via another port. That way you have 65K of full XRAM, plus being able to page multiple 65K RAM blocks.
This way you can read the ENTIRE image on the SD card to RAM during a file-open period.
While the Mac sits idle not accessing floppy, you could have a timer run out and do an autosave to SD card, or when you eject the image from the Mac, itll auto-save to SD card then.
poof. no overhead.
Yeah, that could definitely work if the other approach doesn’t pan out. And now that I understand the write process a little better, I think a slightly modified version of my original plan (writing a whole track during the step delay) might actually work too. I now see several different possibilities, with various cost, complexity, and performance tradeoffs. It’s good to have options!