Archive for September, 2011
Sad Mac
I’m working with my Xilinx Spartan 3A FPGA board for the moment, and I’ve finally made some visible progress. I’ve never been so happy to see a Sad Mac! A boot failure may not seem very exciting, but I’m thrilled that it’s actually doing something recognizably Macintosh-like. That means it’s actually running 68000 code from the Mac ROM, which is drawing stuff to the screen buffer, which is getting read by the video module and displayed to the VGA screen. From here it will be a long, slow road of implementing replacements for the VIA, SCC, IWM, and other components.
Using a FPGA dev board for initial development makes it much easier to get started than it would be with a pile of discrete ICs. All the other “hardware” is actually synthesized inside the FPGA: a 68000 soft-CPU (TG68 core from opencores.org), 32K RAM, and 8K ROM. The synthetic RAM/ROM sizes are much too small for a Macintosh, but are all that would fit inside the FPGA. They’re enough to create a screen buffer and run the initial boot code from ROM, anyway.
I have an Altera DE1 dev board on the way, which has real external SRAM and Flash ROM that I can use instead of the synthetic RAM/ROM. (The Spartan 3A board has DDR RAM that I could never figure out, and ROM that can only be programmed through a serial port.) I haven’t yet decided whether to add a real 68000 to the DE1 on an expansion card, but given how easy it was to get the TG68 soft-CPU working, I’ll probably stick with that.
Eventually I’ll need to construct an expansion card for the Altera DE1 board, containing a microcontroller and some other things, but I should be able to get pretty far with just the DE1. The DE1 is an “educational” board with all kinds of miscellaneous gizmos. My long-term goal is to make an all-custom Plus Too board that contains only the parts actually needed, as well as vintage connectors for a Mac keyboard, mouse, and floppy.
The Sad Mac is appearing because the ROM checksum test failed. That’s not surprising, considering I only implemented the first 8K of the 128K ROM. It’s trying to play a sound, too, by streaming some data through the sound buffer. With some more work to pull bytes from that buffer at 22 KHz, I could hear the glorious boot beep!
Read 13 comments and join the conversationHardware Simplification
Today I’ve been thinking about ways to simplify the Plus Too hardware. Anything I can do to reduce the part count will make the final board easier to build, and also help with the the inevitable debugging work. The nice thing about some of these ideas is that they dovetail, making further simplifications possible.
Soft-68000
My original plan was to use a real 68EC000 CPU. My motivations were a desire for “design purity”, and a wish to avoid having to learn how to work with a 68000 FPGA core and its possible bugs. After experimenting a bit with the free TG68 core, however, I’ve completely changed my mind. It took me just 30 minutes from downloading the TG68 files to having a working 68000 executing instructions from the Macintosh ROM inside my FPGA. The design is much easier to understand than I’d expected, and my fears of configuration voodoo and weird bugs seem unfounded. Using TG68 will not only allow me to eliminate a chip, but will also make it possible to directly inspect CPU registers and perform other live debugging tricks that wouldn’t be possible with a physical 68000 CPU.
One FPGA
In my earlier Too Many Pins entry, I counted the number of required FPGA I/O pins, and concluded I either needed to use two separate FPGAs, or learn how to solder BGA components. It turns out that most of the I/O pin count was related to connections to the physical 68000 CPU. I recounted the I/O pin requirements assuming the use of TG68, and it’s only 58 pins, plus or minus a few that I probably forgot. That’s well within the user I/O count of a single FPGA in a TQFP package, which I have some hope of hand-soldering. So there’s another chip gone.
Configuration by Microcontroller
Unlike the CPLDs I’ve used before, FPGAs don’t maintain their configuration when the power is turned off. They must be reloaded with the configuration bitstream each time they’re powered up. Although I didn’t call it out specifically, I had assumed the FPGA configuration data would be stored in a standard configuration Flash ROM. These ROMs are programmable through JTAG, and have all the smarts needed to configure the FPGA.
Some reconfigurable computers like Minimig use a different technique, and store the bitstream in an alternate medium, using a microcontroller to read the bitstream data and configure the FPGA. Because Plus Too will already have a microcontroller for floppy disk support, and an SD memory card loaded with floppy disk images, it should be possible to use the microcontroller to load the FPGA configuration bitstream from the SD card. I haven’t investigated the details yet, but with some good examples already out there, I’m hoping that it won’t be too difficult to get working.
ROM image in RAM
Taking this same idea one step further, the Macintosh ROM data could be stored in another medium, and loaded into a section of RAM at initialization time. The transfer of the ROM image could probably be done by the microcontroller, or else by a module in the FPGA that runs before the TG68 soft-CPU module starts. Assuming a large enough RAM, both the “real RAM” and the ROM image could exist side-by-side. The address decoder would direct accesses in the ROM address space into an alternate section of RAM, and ensure that all accesses to this section were read-only.
And Then There Were Three
If you put all these ideas together, only three chips remain: an FPGA, a microcontroller, and a RAM. Three main ICs to solder sounds very manageable. The other components are miscellaneous parts like the clock oscillator, buttons and LEDs, and some discrete components for video and audio output. Of course there will still be quite a few connectors to solder: the SD card holder, PS/2 jacks, VGA port, JTAG connector, serial connector, and maybe some others I’m forgetting.
Three’s a Crowd?
To simplify things still further, it might be possible to eliminate the microcontroller, and implement a microcontroller soft-core in the FPGA instead. Free cores for the ATmega instruction set and other microcontrollers already exist. This raises the question of where the microcontroller program and data are stored, however, as both are too big to be implemented inside the FPGA. Perhaps they could live inside the RAM alongside the CPU’s RAM and Macintosh ROM image, but then the microcontroller would contend with the CPU for memory access. Eliminating the microcontroller would also require reintroducing some other part to configure the FPGA at initialization time, so there would be no net component savings. For all these reasons, it doesn’t seem that eliminating the microcontroller would help.
Read 5 comments and join the conversation
Understanding Verilog Warnings
Those of you who’ve followed the blog for a while know about my many frustrations with Verilog. Because it feels sort of a like a procedural programming language, but very definitely isn’t one, I keep expecting to be far more competent at Verilog design than I actually am. While working on Plus Too, the Xilinx synthesis tool reported many, many warnings that I didn’t understand. The warning list grew to at least 100, and was so long that I just stopped reading it. That was dangerous, as most of the warnings were likely problems that needed to be addressed.
I’ve been writing C and C++ programs for years, and I’m very comfortable with the language, its details, and the compiler warnings and errors produced by various mistakes. I normally find the warnings easy to understand, because they reference a specific file and line number, and use well-known terminology to describe the problem. Sure, some more obscure errors like “not an lvalue” would probably flummox a beginner, but at least he’d know what line to scrutinize.
Most Verilog warnings I see are non-localized, and do not reference a specific file or line number. They are design-wide warnings, resulting from an analysis of all the modules in all the .v files. This can make it unclear where to even being looking for the cause of a warning. A typical example is something like:
Xst:647 – Input <vblank> is never used. This port will be preserved and left unconnected if it belongs to a top-level block or it belongs to a sub-block and the hierarchy of this sub-block is preserved.
OK, there’s an unused input named vblank. But where? The vblank signal is routed through half a dozen different modules in the design, so how do I know which one I messed up? The only solution I’ve found is to search the whole project for all references to vblank, and verify each one. I also find that error message much too wordy.
Another example:
Xst:646 – Signal <ramAddr<0>> is assigned but never used. This unconnected signal will be trimmed during the optimization process.
This is basically the same as the first example, but has a totally different warning message. Why? Because one is single combinatorial output, and one is a bit in a register? Then there’s this:
Xst:2677 – Node <ac0/vt/videoAddr_17> of sequential type is unconnected in block <plusToo_top>
It’s essentially the same issue again, but yet another totally different warning message. This time it gives the name of the offending module, so it should be easier to track down.
The general meaning of all these warnings is fairly clear: some expected signal connections are missing. Find the problem, and either add the missing connection, or suppress the warning if the unconnected signal is intentional. There were two other warnings I saw frequently whose meanings were definitely not clear to me, however:
Xst:2042 – Unit dataController_top: 34 internal tristates are replaced by logic (pull-up yes): cpuData<0>, cpuData<10>, cpuData<11>, cpuData<12>, cpuData<13>, cpuData<14>, cpuData<15>, cpuData<1>, cpuData<2>, cpuData<3>, cpuData<4>, cpuData<5>, cpuData<6>, cpuData<7>, cpuData<8>, cpuData<9>, mouseClk, mouseData, ramData<0>, ramData<10>, ramData<11>, ramData<12>, ramData<13>, ramData<14>, ramData<15>, ramData<1>, ramData<2>, ramData<3>, ramData<4>, ramData<5>, ramData<6>, ramData<7>, ramData<8>, ramData<9>.
Um, what? This meant nothing to me. I wasn’t even sure if replacing internal tristates with logic was good or bad. The Xilinx tool shows each warning as a link you can click to get more info, but sadly it doesn’t work. Clicking the link just opens a web browser and does a search on the Xilinx site for “Xst:2042”, which returns no results. In fact, none of the synthesis warning links work. If a warning doesn’t make sense to you, you’re on your own.
After a lot of searching around on other web sites, I finally found a decent explanation. It seems that some (or all?) Xilinx devices do not support tristate logic (a signal with an output enable) anywhere but on the actual I/O pins. Signals internal to the FPGA can not be tristate. Tristate logic is typically used to enable multiple drivers to operate on a single shared bus, one at a time. So instead of using internal tristates, you need to construct your design using additional logic to select which module’s data should appear on the shared internal bus, using a mux or similar method.
That mostly makes sense, but I’m using the FPGA to simulate a system of separate parts (address controller, data controller, CPU, RAM, etc) that will eventually be physically separate chips communicating with tristate logic on shared busses. I don’t want to rewrite my design to eliminate tristate logic, because tristate logic is what will be used for these chips. For now I’ve left the logic as is, and I’m ignoring the warnings, and it seems to be working OK. I’m unclear exactly what the synthesis tool has substituted for the internal tristates, though– “logic (pull-up yes)”? What is that, and what problems might it cause?
The other confusing warning that’s been plaguing the design is:
Xst:2170 – Unit plusToo_top : the following signal(s) form a combinatorial loop: ramData<0>, ramData<0>LogicTrst20.
Xst:2170 – Unit plusToo_top : the following signal(s) form a combinatorial loop: ramData<1>, ramData<1>LogicTrst20.
…
…and so on, for every bit of ramData. This stems from my attempt to specify a bidirectional bus driver akin to a 74LS245:
assign ramData = (dataBusDriverEnable == 1’b1 && cpuRWn == 1’b0) ? cpuData : 16’hZZZZ;
assign cpuData = (dataBusDriverEnable == 1’b1 && cpuRWn == 1’b1) ? ramData : 16’hZZZZ;
This driver has ramData on one side, and cpuData on the other. When it’s enabled, it drives data from one side to the other. The direction in which data is driven is determined by the cpu read/write line. So why does this form a combinatorial loop? I’d expect to see that warning for something like:
assign a = b & c;
assign b = a & d;
but my bus driver code looks OK to me. I still haven’t found an explanation for this one, but I think it’s related to the previous issue about internal tristates. The synthesis tool is probably replacing my bidirectional bus driver tristates with some other logic, which then forms a combinatorial loop. I’m not sure how to fix this one without rewriting the design to use a different method than tristates. But again the final project will see ramData and cpuData on I/O pins connected to other chips using tristates, so I don’t want to rewrite the design.
Read 7 comments and join the conversation
Emulation, or Replication?
Is there any essential difference between an emulator and a hardware clone? What should it mean to “clone” a computer system? These questions have been on my mind a lot recently, as I work on my Mac Plus clone, somewhat calling into question the whole point of the project.
The retrocomputing world is full of emulators for popular computers of the past. These emulators are software programs that run on a modern PC, providing the user with the same experience they’d get on the real computer. While some emulators may require a ROM data file from the original machine, they are still pure software solutions, requiring no special hardware. In the Mac world, programs like Mini vMac, Basilisk II, and Sheepshaver fall into this category.
Less common are hardware replicas or clones of classic computers. These are physical pieces of hardware that you need to build or buy, and that function just like the classic computer they’re based on. This category can be broken down further into what I’ll call physical replicas and functional replicas. A physical replica uses most or all of the same hardware as the original machine, and provides all the same I/O options, allowing for the attachment of vintage peripherals. The Replica 1 copy of the Apple I is a good example. A functional replica, on the other hand, works like the original machine but is not built like one. It probably contains an FPGA or microcontroller, and uses modern I/O devices like USB or PS/2 mice/keyboards, VGA monitors, and memory cards. Plus Too falls into this category, as does the Minimig Amiga replica.
Traditionally it wasn’t possible to emulate another computer at full speed, because the overhead imposed by emulation demanded a host computer that was several times faster than the computer being emulated. However, given the tremendous power of modern computers compared to 1980’s vintage machines, full speed emulation of classic computers is now common. Physical replicas still have their place too, where you want to work with real vintage peripherals.
What then is the place for functional replicas? I’m not sure there is a good one. If I finish Plus Too and put it in a nice box, it will look like a Mac Plus on a VGA monitor, with a PS/2 keyboard and mouse. But if I put a small form-factor Windows machine in a nice box and run Mini vMac on it in full-screen mode, it will look identical. So what if one is a “hardware clone” and one is an emulator– from outside the box, there will be no way to tell the difference. Is the hardware clone cheaper? Doubtful. More compatible? Possibly, but Mini vMac is compatible with virtually all classic Mac software. More portable? More hackable? Perhaps.
It’s all a little discouraging to think about. I don’t think it’ll stop me from working on the project, because it’s been a fascinating learning experience so far. But it does make me wish for some ways I could distinguish Plus Too from a pure software emulator. Maybe I need to reconsider the use of vintage Mac peripherals, although that would certainly make the project a lot more challenging. Or maybe there’s another interesting way to emphasize the hardware aspect. If you’ve got a good idea, please post it in the comments. Thanks!
Read 7 comments and join the conversationFrame Buffer Test
I’ve made some progress on Plus Too, but don’t get overly excited by the photo until you understand what it’s doing. I’ve finished the Verilog implementation of the video timing and pixel shifting modules, as well as some portions of the address decoder and the interleaved memory controller that I described earlier. I synthesized those modules, added a fake 32K ROM implemented inside the FPGA, and mapped it into the portion of the address space where the screen buffer is supposed to reside. I filled the ROM with a random 512 x 342 Mac desktop screenshot that I grabbed from the web. Using my Spartan 3A development board, I downloaded this design to the FPGA, and connected a standard VGA monitor. The result is the photo you see here. It’s just a static image, and there’s no interactivity, no software, and no CPU.
So what does this prove? Not too much yet, but it demonstrates that my video timing module generates the correct memory addresses and load enable signals, and with the correct timing. It also demonstrates that my pixel shifter module retrieves data from memory correctly, using the unconventional “double pumping” technique that was described in the memory controller blog entry. This technique performs two 16-bit wide loads every eight cycles of 8 MHz CPU clock, on the 5th and 7th cycle. Because the intervals between loads aren’t constant, the pixel shifter module must load the data into a different portion of the shift register for 5th cycle loads vs 7th cycle loads.
Long story short: some encouraging pictures to look at while I continue to work on the meat of the design for my Mac Plus clone.
Be the first to comment!An SD-card Floppy Emulator for Classic Compact Macs
While working on the “Plus Too” Mac Plus clone, I’ve started thinking further about a semi-related side project: a floppy drive emulator that works with actual classic compact Mac hardware (the Mac 128K, 512K/e, and Plus). These machines all have 400K or 800K floppy drives, and modern floppy drives are physically incapable of using disks in 400K/800K format. That means if you’ve got one of these classic Macs, you also need a second, slightly newer Mac with a high density floppy drive (Apple called them FDHD) so you can copy data back and forth between standard 1.4MB disks and 400K/800K disks. To get Mac software onto your Mac Plus, you need to download it from the web using a modern PC, copy it to a 1.4MB floppy, move that floppy to the FDHD-equipped Mac, use that Mac to copy the software to a 400K or 800K floppy, then finally move that floppy to the Plus. What a pain.
You could also use a modem, null-modem connection, or LocalTalk networking to get software onto the Plus, but the average hobbyist is even less likely to have the equipment necessary for those methods than for the floppy disk chain transfer.
The idea of a floppy emulator for compact Macs (using SD cards or similar media) has been discussed before in the Mac hobbyist community, but as far as I know, none exists. Maybe that means I’m the guy who should design one. I’ve spent a fair bit of time studying the details of the IWM controller chip and the floppy disk data encoding, and I understand enough to think that the project is feasible. Here are a couple of thoughts on what such an emulator would look like and how it would work.
DB-19
The emulator would be a small PCB with a DB-19 connector, that plugs directly into the floppy port. No cables required. It looks like finding a source for DB-19 connectors will be very difficult, though, so it might be necessary to make one somehow. None of the usual electronics supplies like Digi-Key have DB-19 connectors, and the few places on the web that do advertise them only have solder cup terminated connectors intended for making cables. And even those places look old and out of date, making me question whether they actually still have DB-19 connectors in stock.
Storage
My original idea was to put an SD card socket on the emulator, so you could fill the card with disk images using your PC, then put the card into the emulator. The main drawback is that not everyone has an SD card reader on their PC. Better would be a USB connector, and when connected to the PC the emulator would appear as a generic mass storage device. In that case, the actual storage still might be an SD card, or it could be generic flash ROM, or battery-backed RAM. I’m unsure if this would require worrying about wear-leveling and making a flash driver, though.
Supported Formats
The emulator would support 400K and 800K disk images in raw or DiskCopy 4.2 format. Maybe later it could also support 1.4MB formats, but that would require studying the SWIM design instead of IWM. And anyway if your Mac supports 1.4MB disks, it’s probably easier to just make them on a modern PC or Mac. The emulator would not support “super disks” larger than a real physical disk, because the floppy driver in the Mac’s ROM would not be able to use them. Although maybe this could be worked around with some kind of custom init that replaces the ROM floppy driver…
Number of Disks
The emulator would only emulate a single external disk drive. This is a bummer, but the floppy connector is only designed to connect to a single drive, and there are no pins for the Mac to select a specific drive or give a unique ID to a drive. Again, maybe this could be worked around with a custom floppy driver replacement and some non-standard use of the floppy data lines…
Read/Write
Both read and write operations would be supported. Read would probably be a lot easier to implement first, so the initial prototype would likely be read-only.
Variable Speed
The Mac 400K/800K drive was a variable speed drive, unlike PC floppy drives. This is why PC floppy drives are physically unable to read/write 400K or 800K floppies. For the purposes of emulation, though, I don’t think this matters at all. The emulator would ignore the drive RPM control signal coming from the Mac. The actual data rate is still constant, I believe. And even if it’s not constant, I think I can still work with it.
Implementation
The emulator would consist of an Atmel AVR microcontroller and SD card socket. The AVR would need about 12KB of internal RAM. The ATmega1284P looks good. A pre-existing SD-card FAT-reader library would be used to search the card for files with the .dsk extension, and read data chunks from them.
The AVR would use the 9 control/data lines on the floppy connector (documented by Apple) to communicate with the Mac, acting like a normal floppy drive. It would internally maintain the position of a virtual disk head, defined by the current track number and rotational position within the track. Track-to-track in/out movement of the virtual head would be performed by the Mac, using the control lines. When the emulator was instructed to step to a new track, it would load the data for the corresponding sectors from the SD card into a RAM buffer. This data would be GCR-encoded on the fly, and appropriate sector headers, lead-ins, lead-outs, checksums, and sync bytes would also be generated. The result would be a byte-for-byte replica of the low-level data format for that track on a real floppy disk.
Virtual rotational movement through the track would happen automatically, at a fixed rate (I think it’s 1 bit per 2 microseconds). When the Mac requested a read, the emulator would return the data at whatever rotational position was current. The Mac software would keep reading data until the sector it wanted came into position, just like for a real floppy. When the Mac requested a write, the emulator would overwrite data at whatever rotational position was current. This data would be GCR-decoded on the fly, headers/checksums/etc thrown-away, and the actual sector data written back to the SD card.
Read 10 comments and join the conversation