Violating Setup Times With Floppy Writes
I’m working on adding write support for the Floppy Emu emulated Macintosh floppy drive. Data coming from the Macintosh to be written to the floppy is encoded in an interesting way. There’s no clock signal, but just a single WR data signal. The incoming WR data is divided into bit cells of “about 2 microseconds” duration, according to the IWM datasheet. At each bit cell boundary, a high-low or low-high transition indicates a logical 1 bit, and no transition indiciates a logical 0 bit.
This technique presents some challenges to the device that’s decoding the WR data. Without a clock, how does it know when to sample the data for the next bit? And without some kind of framing reference, how does it identify the boundaries between bytes?
Instead of sampling bits at some fixed frequency, my solution uses 16x oversampling to measure the duration between WR transitions. A measured duration of about 2 microseconds (with some error tolerance) is interpreted as a 1, about 4 microseconds is 01, and about 6 microseconds is 001. Durations longer than 6 microseconds should never appear, since the GCR encoding method forbids having more than two consecutive 0 bits.
To identify the boundaries between bytes, the circuit uses the fact that all valid GCR bytes have a most significant bit of 1. If the MSB of the shift register is 1, it saves the completed byte, and clears the shift register. Assuming it starts at a random location in the bit sequence, the circuit will eventually sync up with the byte boundaries correctly, but it may take many bytes before it syncs correctly. Fortunately the Apple designers planned for this, and each sector begins with a string of 10-bit sync bytes 1111111100. No matter where it starts in the sequence, a shift register using this byte indentifiaction technique will get in sync after no more than five sync bytes.
The waveform above shows a simulation of the start of a sector, consisting of five sync bytes followed by D5 AA 96, the sector header identifier. The top trace is the WR signal, and the bottom trace is the output of the shifter/decoder circuit. Here’s my first version of the Verilog code, using an 8 MHz input clock, where 16 clocks equals 2 microseconds.
reg [7:0] shifter; reg [7:0] wrData; reg [4:0] bitTimer; reg wrPrev; always @(posedge clk) begin // was there a transition on the wr line? if (wr != wrPrev) begin // has at least half a bit cell time elpased since the last cell boundary? if (bitTimer >= 8 ) begin shifter <= { shifter[6:0], 1'b1 }; end // do nothing if the clock count was less than 8 bitTimer <= 0; end else begin // have one and a half bit cell times elapsed? if (bitTimer == 24) begin shifter <= { shifter[6:0], 1'b0 }; bitTimer <= 8; end else begin // has a complete byte been shifted in? if (shifter[7] == 1) begin wrData <= shifter; // store the byte for the mcu shifter <= 0; // clear the byte from the shifter end bitTimer <= bitTimer + 1'b1; end end wrPrev <= wr; end
I implemented the circuit as above, and it mostly worked. The output was recognizably close to what was expected, but with lots of seemingly random bit errors. The errors weren’t consistent, and comparing the output to the expected values, the errors didn’t appear to be systematic either. I was hoping that they might all be cases of a 0 turning into a 1, or all cases of a 1 turning into a zero, or all cases of a single bit being added or removed in the sequence, but it was nothing like that. I couldn’t find any identifiable pattern to the errors at all.
A day passed. I chased after theories involving voltage levels, bus contention, poor wiring, and others.
Finally I got to thinking about the timing relationship between the WR signal and the 8 MHz clock– there is none. I should have realized this earlier, since it’s nearly the same problem I had a few weeks back with the LSTRB signal when I was implementing read support. WR might transition right at an 8 MHz clock edge, so that its sampled value is neither a clean logical 0 or 1, but somewhere in between. What happens then?
Naively, I had thought it would either do the 0 behavior, or the 1 behavior. In this example, it would either do the first if block and add a 1 to the shifter, or else it would do the second if block, and check the timer to see if it should add a 0 to the shifter. It wouldn’t really matter which behavior it did– a transition on WR would either add a 1 to the shifter on clock cycle N or N+1, but it would still get added. The test for bitTimer >= 8 would make sure that an apparent double-transition of WR didn’t accidentally add two 1’s. Everything would work great.
If only it were so simple. The registers bitTimer, shifter, and wrData are composed of many individual macrocells in the CPLD, one macrocell per bit. Each macrocell will decide independently if wr != wrPrev at the clock edge. What happens if they don’t all agree, and some macrocells think there was a transition, and others don’t? You get a big mess of random errors, which is exactly what I was seeing. This is why a synchronous system would impose a setup time on WR, to make sure its value was established long enough before the clock edge to ensure that every macrocell saw the same value. This isn’t a synchronous system, though, and there’s no way to guarantee that WR won’t change states at a bad time.
Fortunately the solution is simple: just send WR to a register, then use the register value in the circuit instead of WR. That means the circuit will be using a value of WR that’s delayed one clock from the “live” value, but that’s not a problem here. Because the value of the register will only change at a clock edge, the circuit that uses the value won’t see it change states at a bad time, and setup time requirements will be met. This technique is probably second nature to many readers, who’ve been shouting at their monitors for the past six paragraphs, but it took me a while to figure out. The code changes look like this:
reg [1:0] wrHistory; always @(posedge clk) begin // was there a transition on the wr line? if (wrHistory[1] != wrHistory[0]) begin // ... remaining code is the same as before // ... wrHistory <= { wrHistory[0], wr} ; end
With that change, I’m now able to reliably parse floppy write data coming from the Mac. Next up: reading the data with an interrupt routine, and saving it to the SD card.
Read 6 comments and join the conversation
6 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
Whoops, here’s a new complication. Sectors are divided into an address block that identifies the sector, and a data block with the actual sector contents. It appears that when writing to an already-formatted disk, the Mac only overwrites the data block and leaves the address block untouched. So when Floppy Emu is watching data as it arrives from the Mac, it won’t know what sector it’s for since there’s no address block. I guess it will need to keep track of which sector was being read at the time the Mac switched to write mode.
Awesome work as usual!
One flip flop is not actually enough to prevent metastability. You really need at least two to have a reliable circuit. This is known as a two flip flip synchronizer. See http://digitalelectronics.blogspot.com/2005/12/asynchronous-in-synchronous-world-part.html for more info.
The circuit you have described has two flip flops, but the comparison “if (wrHistory[1] != wrHistory[0])” is still fed by the first flip flop. If the signal is transitioning when this first flip flop samples it, it is possible for the metastable state to pass to it’s output, which would cause issues as before.
Something like this shoudl avoid this issue:
reg [2:0] wrHistory;
always @(posedge clk) begin
// was there a transition on the wr line?
if (wrHistory[2] != wrHistory[1]) begin
// … remaining code is the same as before
// …
wrHistory <= { wrHistory[1], wrHistory[0], wr} ;
end
Thanks, that makes sense. I imagine this is a probabilistic thing? If the chance of a metastability problem with one synchronizing register is 1/N, then two registers has probability 1/N^2, three registers 1/N^3, and so on?
Correct. This is one area of research that has gotten a lot of attention now that there are SOCs with many different clock domains with signals crossing between them. There are some pretty good tools out there for the ASIC/large FPGAs, but they are expensive. A good rule of thumb is to use a 2 flip flop synchronizer anytime a signal is received from a different clock (including the case of async signals like this).