FPGA In-System Programming: Beginnings

July 09th, 2021 | Category: Yellowstone | Author: Steve

I’ve written several times about Yellowstone’s need for in-system FPGA reprogramming. Once the Yellowstone card is in the user’s hands, the user needs a way to update the FPGA firmware when new versions are released, without requiring a JTAG programmer or some kind of USB interface built into the card. The current Yellowstone prototype was designed to support in-system FPGA programming, but it had never been tested until today. The photo here shows a BASIC program on an Apple IIe successfully communicating with the FPGA’s SysConfig port and reading its device ID. I’ve verified that 01 2B A0 43 is the expected ID for the Lattice MachXO2-1200HC FPGA on the Yellowstone board, so everything looks correct. It will still be a long road from here to reach the point where the FPGA is actually reprogrammed, but this proves that the basic communication mechanism works, which was the part I was most concerned about.

Yellowstone’s in-system programming support doesn’t require any extra hardware on the board, which is great! It’s all implemented through PCB routing, connecting the right signals to the right pins. The MachXO2 FPGA supports communication with its built-in SysConfig hardware through many different interfaces including JTAG, I2C, and a Wishbone interface implemented in the FPGA logic, but I’ve chosen to use the SPI interface. This is a “hard” SPI port, so reconfiguration and reprogramming via SPI should work even if the FPGA is in a fubared state or is completely blank.

How do you connect the Apple II peripheral bus to the SPI port, in a way that avoids accidental transitions on the SPI I/O signals, but allows for full bidirectional SPI I/O when needed? Here’s how I did it.

CS: The SPI SysConfig port has a chip select input pin, which is normally high, leaving the SPI port disabled. As long as CS remains high, it doesn’t matter what’s happening on the other SPI I/O signals, and the FPGA will ignore it. The CS input is connected to another FPGA pin that’s normally in a Hi-Z state, but that can be driven low by FPGA logic. This makes it possible to programmatically enable the SPI port by having the Apple II CPU write a magic value to a special address, which the FPGA logic watches for, in order to drive CS low. For situations where the FPGA is blank or in a bad state, the programmatic enable may not work. In this case there’s also a hardware jumper on the PCB that can be used to force CS low.

DI: The data input is address line A0.

CLK: The SPI clock is the Apple II /DEVSEL signal. This means an SPI data bit will be clocked into the FPGA from A0 whenever there’s a low-to-high transition on /DEVSEL. This signal is normally high, but it goes low for about 500 ns whenever the Apple II CPU makes a memory reference to the “device” region of the peripheral card’s memory. For Yellowstone, this region is where the virtual IWM lives, in address range $C0E0 to $C0EF (assuming the card is in slot 6).

DO: The data output from the SPI port is connected to another FPGA input pin. Yellowstone’s FPGA logic includes a special behavior that makes use of this. If the SPI port is enabled and the CPU reads from address $C0EA or $C0EB, the FPGA will return the SPI DO bit that it sampled on its other pin, rather than the normal return value from the IWM.

If this last piece sounds somewhat complicated, it is. The DO bit needs to get on the data bus somehow, so that the CPU can read it. But DO can’t be directly connected to the 5V data bus – it must connect somewhere on the 3.3V side in the Yellowstone logic, and then the ‘245 bus driver must be enabled at the proper time to drive the DO value onto the 5V bus. The ‘245 enable timing is under control of the FPGA logic. That means when the FPGA is blank or in a bad state, there’s no ‘245 enable, so there’s no way for the CPU to read DO. In short, a correctly-configured FPGA is a requirement for reading DO. Without this, the program running on the Apple II will need to proceed with blind SPI communication in which it can send data but not receive it. Fortunately this seems to be sufficient to perform FPGA reprogramming. I don’t see any way around this inability to read DO when the FPGA is blank without adding extra hardware to Yellowstone, which I’m loathe to do for such a rare situation. So be it.

When all of this is put together, the SPI communication looks like this: The CPU writes a magic value to a special address to force CS low. Then it begins a series of reads from $C0EA or $C0EB. Every read from $C0EA transmits a 0 bit because A0 is 0, and every read from $C0EB transmits a 1 bit. In either case, D7 of the byte that’s read is the reply bit from the SPI port. The rest is all software to transmit the correct bit sequences and make sense of the replies.

Now onward to the FPGA datasheet, where I can learn what sorts of magic SPI incantations are required to actually reprogram this thing.

Read 10 comments and join the conversation

10 Comments so far

John Payson - July 9th, 2021 11:41 am

Would it be possible to use a resistor to connect the output enable of the bus driver to /DEVSEL so that if the FPGA is disabled, all reads from mapped I/O space will return data?

As for chip-select, how about having a jumper connect it to A13? Code to access the chip would need to be run from addresses in the range $0000-$1FFF, $4000-$5FFF, or $8000-$8FFF, and refrain from accessing RAM outside those ranges when it isn’t trying to deliberately release CS, but code could perform a read of $2000 to momentarily release CS, which could be used for frame synchronization. /CS could be disabled for a longer period by switching to hires page 1 and running code from $6000-$7FFF; otherwise, code would need to either use text modes or hi-res page 2.
Steve - July 10th, 2021 7:33 am

I think tying /CS low will work fine for “recovery” mode when the FPGA is buggy or blank. It needs more testing, but there shouldn’t be any need to bring /CS high during the programming operation.

That’s a clever idea to put a resistor between /DEVSEL and the ‘245 output enable, so /DEVSEL weakly drives the output enable if the FPGA is inactive, but it can be overridden by a strong high/low signal from the FPGA under normal conditions. This wouldn’t help if the FPGA is buggy and generating the wrong output enable signal, but it could help when the FPGA is blank.

There’s one potential problem: the output enable timing is surprisingly tricky. If you enable the ‘245 output immediately when /DEVSEL is asserted (or /IOSEL or /IOSTROBE), it can actually cause contention on the Apple II data bus, because the Apple II’s own bus driver doesn’t turn off until at least 30ns later. This is the reason that my 2018 efforts with Yellowstone failed, and why I abandoned the project for two years until I realized what was going wrong. You can read more about my struggle with that issue at https://www.bigmessowires.com/2020/11/20/yellowstone-back-from-the-dead/. My solution is to configure the FPGA to delay the ‘245 output enable until roughly 140ns after the slot enable signal is asserted. The sneaky /DEVSEL resistor wouldn’t have any delay, though, so it might reintroduce that bus fighting problem.

This output enable delay is an interesting and frustrating part of Apple II peripheral card design. Apparently the earliest peripheral cards were able to ignore the whole issue, because they used 74LS logic with slow propagation delays and weak high drive currents. They couldn’t respond very quickly when the slot enable signal was asserted even if they wanted to, and if by chance there was brief bus contention, it was mostly harmless and the chip driving low would simply override the chip driving high. But when cards started using faster CMOS logic with higher drive currents, it became a problem, and it was necessary to introduce intentional output delays. This Apple Tech Note discusses the problem in the section heading “Avoiding Bus Fights”. It says the Apple IIGS turns around its bus driver in 30ns, but in my tests with an Apple IIe, the turn-around time is longer, somewhere greater than 70ns. http://www.1000bit.it/support/manuali/apple/technotes/iigs/tn.iigs.068.html
Steve - July 10th, 2021 7:47 am

Here’s another Tech Note with even more detailed information about the data bus output enable timing, in the section heading “During DMA”. Yellowstone isn’t doing DMA, but the timing diagrams are still helpful. http://www.1000bit.it/support/manuali/apple/technotes/aiie/tn.aiie.02.html
Steve - July 10th, 2021 5:48 pm

Oops, what was I thinking? Of course it needs to bring /CS high during the programming operation, after every SPI command.
John Payson - July 11th, 2021 7:00 pm

If you add a tiny amount of capacitance to go with the resistor, you could use that to delay the enable to the ‘245. Since the 245’s logic threshold is probably much less than 2.5 volts, the switch-on delay would probably be about 2-4x the switch-off delay, which would be perfect for this application.

I didn’t know if releasing /CS between commands was necessary with this particular chip, but it’s a common enough requirement I thought it worth mentioning. Not sure if using A13 would be a good approach, but it would be simple and I think it should work if you don’t need to use HGR page 1, don’t need to deassert /CS for very long, and don’t mind being limited in where you can keep code and data that will be accessed during a command.
Steve - July 12th, 2021 7:16 am

A13 may be the only workable choice. It needs to be a signal on the Apple II peripheral bus that the updater program can control somehow, and that can be kept continuously low for a long period during SPI commands, and that can be forced high when necessary. That seems to rule out everything but the address signals. It also needs be low during access to $C0EA/B, which rules out A15 and A14. So A13 is the most significant address signal that could possibly work. Some of the lower address signals could also work, but they carve up the address space into smaller pieces, so they would be less convenient.

The updater program is purely text-based. The main body of the program can fit between $800 and $1FFF where A13 and therefore /CS will be low. The data will be in addresses above the main program body, some of it where A13 is low and some where it’s high. The only big obstacle I see is that the program must copy blocks of data from their original location in memory to a temp buffer in the lower address range before sending Program commands via SPI. Otherwise /CS might go high during transmission of the Program command data, if the data were being read directly from a higher address range. The copying to a temp buffer will make the whole process slower, but otherwise shouldn’t be a problem.

I’m still on the fence about how to handle the case of a blank or buggy FPGA, where the FPGA can’t be relied on to help manage the SPI data output DO. This should be very rare, as it means you’ve somehow bricked the device. My original thought for this case was that you’d perform the FPGA reprogramming blindly, without access to SPI DO, then you would verify that it worked afterwards when the FPGA should be running normally again. I think that’s probably fine for this rare case, but if DO is desired, I’m leaning towards doing it right with a small amount of extra fixed logic.

This could be a 1-bit unidirectional driver for SPI DO onto D7 of the Apple II data bus, with a large series resistor to prevent any harm in the event of a brief bus fight. This driver would be enabled whenever DEVSEL=0 and RW=1 (CPU is performing a read from DEVICE address range). Or it could even just be enabled by DEVSEL=0, and rely on the resistor to avoid problems if there’s a write while DEVSEL=0, which should never happen in the updater program. The logic would also need to force the normal ‘245 bus driver off whenever this second driver is on. I think this could be accomplished with a single NAND gate. I’m not sure what the best choice for a 5V-friendly 1-bit tristate buffer might be. This scheme would also require two hardware jumpers instead of one: a jumper to tie SPICS to A13, and another to tie the 1-bit driver’s OE to DEVSEL. Or it would need some additional logic to do both of those things indirectly, as a result of a single jumper.
John Payson - July 12th, 2021 7:55 am

Are there any cases where one would want to avoid driving the data bus when R/W and phi2 are high and either /DEVSEL or /IOSEL is low? Are there any where you would want to drive the bus when R/W or phi2 is low, or when /DEVSEL, /IOSEL, and /IOSTRB are all high? If you use a triple 3-input NAND gate chip where the first NAND receives /DEVSEL, /IOSEL, and /IOSTRB, but the latter signal is fed through a resistor and can be overdriven by the CPLD, and the second NAND gate receives the output from the first along with R/W and phi2, I think the output of that second gate could be used to drive the ‘245 and the worst thing the FPGA could do, even if misprogrammed, would be to stomp on $C800-bank accesses intended for other boards.

I suppose that for software compatibility, there may be situations where one would want to have /IOSEL reads of odd addresses yield whatever was last put on the bus by the video scanning circuitry, but that could be accommodated by having the FPGA poll the state of the bus while phi2 is low, and output that data when phi2 goes high.
Steve - July 12th, 2021 4:35 pm

Good news, bad news, good news – The FPGA in-system programming is working, when the FPGA is already programmed with an older firmware version. But the A13 recovery mechanism doesn’t work if the FPGA is blank. The recovery mechanism *does* work if the FPGA is programmed but the programmatic control of SPI /CS is removed. That means the whole scheme with A13 is OK, but there’s some other problem caused by having a blank FPGA. The default for an unprogrammed device is to make every pin an output with a weak pulldown, and this must be causing a problem somewhere.

I was able to read back the FPGA internal flash memory using JTAG, after trying and failing to program a blank device with my in-system programming utility. The data looks 99 percent correct, but every now and then an extra 1 bit somehow got inserted into the programmed bitstream. It’s always an extra bit inserted, rather than an existing bit with the wrong value, which makes me think the SPI clock (DEVSEL signal) must be glitching for some reason when the FPGA is blank.
John Payson - July 13th, 2021 9:31 am

If the FPGA is outputting a weak pull-down on the ‘245 enable pin, would that cause the ‘245 to drive the bus constantly? What if you add a 4.7K or similar pull-up?
Steve - July 13th, 2021 2:15 pm

It’s fixed now, and in-system programming of a blank FPGA is working. The extra 1 bits were caused by a bug in my utility program, related to random-looking values being read from the blank FPGA. All of the important FPGA outputs on the PCB have external pull-up resistors to ensure everything stays in a well-defined state when the FPGA is blank or in the midst of being reprogrammed. There are still a few small loose ends to resolve, but the in-system programming task is essentially done.

FPGA In-System Programming: Beginnings

10 Comments so far

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.