Atari 2600 Hardware Acceleration
Atari 2600 programming techniques are fascinating. The hardware is so very limited, and programmers must use every possible trick to scrimp and save the last CPU cycle and byte of RAM. For a refresher on the 2600 hardware, check out my previous Atari overview. The console’s longevity is remarkable, with new games and demos still being produced today – over 100 new titles last year alone. The quest for the ultimate Atari 2600 programming techniques has continued all this time, in order to wring out maximum performance.
Atari programming requires racing the beam, updating graphics registers just before the instant they’ll be used to paint the next pixels on the current scan line. With only 76 CPU cycles per scan line, there just isn’t enough time for the poor 6502 to do very much. Want to update the foreground color multiple times at different horizontal positions? OK, but there might not be enough time remaining to also update the pixel bitmap during the scan line, or set the sprite positions. It’s a series of difficult tradeoffs and code optimization puzzles.
More Hardware Makes Things Better?
To create better-performing Atari games, programmers may need to think outside the box – literally. The 2600 console consists of a 6502 CPU (actually a 6507), a 6532 RIOT, and an Atari custom graphics chip called TIA. But what about the game cartridge? What’s inside that? The Atari’s designers envisioned cartridges as simple 4KB ROMs in a protective plastic shell, but there’s no reason aside from cost why there couldn’t be other hardware inside a cartridge too.
Within a few years after the Atari’s 1977 release, game publishers began including a small amount of 7400-type glue logic or a simple PLD inside game cartridges, in order to assist with bank switching. The Atari only provides 4KB of address space for game cartridges, but with this extra hardware inside, publishers were able to make 8KB and 16KB games using custom bank-switching schemes. Some later cartridges also included additional RAM to augment the paltry 128 bytes available in the Atari. This provided more storage space to create larger games, but didn’t help improve the Atari’s CPU or graphics performance.
Pitfall II and the DPC Coprocessor
Activision’s Pitfall was one of the most popular games for the Atari 2600. For the sequel Pitfall II, expectations were high, and the game’s designer David Crane did something that had never been done before: put a full-blown coprocessor inside the game cartridge. Crane’s DPC (Display Processor Chip) added two additional hardware sound channels, a music sequencing capability, a hardware random number generator, and a graphics streaming capability with built-in options for masking, reversing, or swizzling the bits.
That’s an appealing collection of features, but the graphics streaming capability is probably the most interesting. To understand how it can help improve graphics performance, let’s first look at some hypothetical code without DPC. Imagine that the game code wants to read bytes from a graphics table in ROM and stuff them into the TIA’s GRP0 register (the pixel bitmap for sprite 0) as quickly as possible, in order to create different patterns at points on the scan line. The code might look like this:
; setup
lda #>DrawData ; an address in ROM
sta GfxPtr ; a pointer in RAM page 0
lda #<DrawData
sta GfxPtr+1
; kernel
lda (GfxPtr),X ; 5
sta GRP0 ; 3
inx ; 2
lda (GfxPtr),X ; 5
sta GRP0 ; 3
inx ; 2
lda (GfxPtr),X ; 5
sta GRP0 ; 3
This requires a total of 10 CPU cycles for every repetition of read, write, and increment of the index register. During those 10 CPU cycles, the electron beam will move 30 pixels horizontally, so in this example it would only be possible to update the pixel bitmap every 30 pixels. This limits the number of objects or patterns that can be displayed in the same horizontal row of the screen.
The DPC provides some extensions to speed up this process. From my study of the chip, it works like this:
- Writing to a special pair of addresses in the cartridge’s address space will set up a graphics stream pointer
- Reading from a special address will return the next byte from the stream pointer, and increment the pointer
Using the DPC, the previous example could be rewritten something like this:
; setup
lda #>DrawData ; an address in ROM
sta BytePtr ; special addr in cartridge address space
lda #<DrawData
sta BytePtr+1
; kernel
lda NextByte ; 4 - special addr in cartridge address space
sta GRP0 ; 3
lda NextByte ; 4
sta GRP0 ; 3
lda NextByte ; 4
sta GRP0 ; 3
This reduces the total number of CPU cycles for every repetition from 10 down to 7, enabling the pixel bitmap to be updated every 21 pixels instead of only every 30 pixels – quite a nice improvement.
Pitfall II was released in 1984. Despite the major effort that doubtless went into designing the DPC chip, Pitfall II was the only game to ever use it. No other Atari 2600 games from the original Atari era 1977-1992 ever included a DPC or other coprocessor. The DPC was an impressive one-off, and remained the pinnacle of Atari 2600 hardware acceleration tech for 25 years, until rising interest in retrogaming eventually rekindled interest in the Atari console.
21st-Century Hardware
The Harmony Cartridge for Atari 2600 was first released in 2009. Harmony is a bit like my BMOW Floppy Emu Disk Emulator, but for Atari game cartridges instead of Apple floppy disks. Inside the cartridge is an ARM microcontroller and an SD card reader, and the hardware loads Atari game ROMs from the SD card and then emulates a standard ROM chip, including any necessary bank switching logic or extra RAM that may have existed in the original game cartridge.
The original function of Harmony was simply to be a multi-ROM cartridge, but for Pitfall II, Harmony’s designer was also able to use the ARM microcontroller to emulate David Crane’s DPC chip. Before long, he and a few collaborators began to share ideas for a new coprocessor design, like DPC but even more powerful, and emulated by Harmony’s ARM chip. While this wouldn’t do anything to benefit the library of original Atari games, it would open up new possibilities for the active community of Atari 2600 homebrew game developers. The coprocessor design they eventually created is called DPC+.
I haven’t looked at DPC+ in detail, so I may be misstating the specifics, but I’ve seen three features that look particularly interesting.
FastFetch – Atari programs often include time-critical sequences that load a byte from zero page RAM (3 CPU cycles), write the byte to a TIA register (3 CPU cycles), and repeat. Six total CPU cycles per iteration. For drawing fixed graphics patterns, the load from zero page RAM can be replaced with a load of an immediate value into a register, which reduces the total to five clock cycles but means the graphics data must a constant. That constant value is part of the program, which is stored in ROM, which is emulated by the Harmony cartridge. Harmony is able to perform a bit of trickery here, so when the program performs an immediate load of the constant value, Harmony actually supplies data from the graphics stream instead. This makes it possible to draw dynamic graphics data while still enjoying the five-cycle performance of drawing constant graphics data.
Bus Stuffing – The data bus output drivers on the NMOS 6502 (and 6507) chip are non-symmetric. They drive logical 0 bits strongly to near 0 volts, but logical 1 bits use a weak pull-up to 5 volts. If the CPU is outputting a 1 on the data bus, external hardware can safely overdrive the bus with a 0 without damaging the CPU – something that would be dangerous with a symmetric push-pull output driver. For time-critical code loops, Harmony uses this technique to eliminate CPU graphics data reads entirely. The CPU program just performs a long series of write instructions, repeatedly writing the value $FF (binary 11111111) to TIA registers, while Harmony pulls down the appropriate data bus lines to create the next byte of the graphics stream. This reduces the number of CPU cycles needed per iteration to a mere three cycles.
ARM Code Execution (ACE) – Still not fast enough? Harmony also enables the Atari’s 6502 program to upload and execute arbitrary code on Harmony’s 32-bit 70 MHz ARM microcontroller. Compared to the 8-bit 6502 running at 1 MHz, that’s a dramatic performance improvement for compute-heavy code.
Aside from DPC+, Harmony also supports other coprocessor models called CDF and CDFJ, which are further extensions of DPC+. I haven’t looked at these.
Harmony is no longer the only player in this space, and since 2018 some interesting alternatives have appeared. UnoCart 2600 is conceptually very similar to Harmony, but is open source and uses a more capable microcontroller. To my understanding, UnoCart 2600 supports bus stuffing and ACE, although ACE code for Harmony isn’t directly compatible with the UnoCart 2600 due to the differing memory maps of the microcontrollers on the two cartridges. UnoCart 2600 does not support other DPC+ extensions, which I think is because Harmony is a closed source design and its maintainers have chosen not to share the tech details needed for DPC+ emulation. Most recently and intriguingly, PlusCart has appeared as an UnoCart 2600 spin-off that loads game ROMs over WiFi instead of from an SD card. It’s the golden age of retro-Atari products.
When is an Atari no longer an Atari?
Thanks to Harmony and similar devices, within roughly the past 10 years, there has emerged a large and growing library of “hardware accelerated” homebrew Atari games. With the extra support of the coprocessor in the Harmony or UnoCart cartridge, these games are able to create graphics and sound that are much more impressive than anything that was possible back in the 1980s.
This all leads to the question I’ve been grappling with. When an Atari 2600 game console from 1977 is hot-rodded with a 32-bit coprocessor running at 100+ MHz, is it still an Atari 2600? Should games that target this augmented hardware still be considered Atari 2600 games?
Here’s my small editorial. There are no right or wrong answers about this, and everyone should be encouraged to do whatever they find to be most fun. From a technical standpoint I find Harmony and its siblings to be impressively clever, fascinating to study, and I wish I’d thought of these ideas myself. And the multi-ROM cartridge capability is extremely convenient for any modern game collector who doesn’t want to store hundreds of physical game cartridges.
As for the DPC+ coprocessor and similar hardware acceleration extensions, they make me feel… conflicted? Uneasy? When it comes to retrocomputing, I’m something of a hardware purist. For me, the appeal of the Atari 2600 comes from trying to make the most out of its limited hardware capabilities. By adding new 21st-century hardware with new performance-enhancing capabilities, it’s effectively moving the goalposts. I can compare two Atari games that target the original hardware, and if the first game looks visually more impressive than the second, I’ll know the first programmer did a better job. But if one of the games uses a coprocessor with extra hardware capabilities, then it’s no longer a fair comparison. Eventually I hope to try writing my own Atari game, and I’ll probably target the original hardware without any coprocessor extensions, so that I can fairly compare my efforts against the classic Atari games of the 1980s.
Read 6 comments and join the conversation6 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
I agree with trying to stay close to hardware-at-the-time. I was impressed by the new Lode Runner port for the 2600 that was just released until I found out it runs a lot of code on the ARM chip in the cartridge.
While it’s fun to try to fit in 4k, sometimes games are so much better if you can use the 16k ROM / 2k RAM in something like the E7 mapper, and that was a real mapper used on real games (like Burgertime) back in the day.
if you wanted an interesting project it might be making a cartridge that supports things like the E7 mapper with just discrete logic, rather than a microcontroller. It’s fun to make custom cartridges, which if you have a ROM burner is trivial for 2k and 4k games (and sometimes 8k games), but beyond that your only real option is having to get one of the fancy/expensive microcontroller cartridge when in some cases you probably could just get by with a static ram chip and a few flip-flops.
Good idea about making a discrete logic cartridge. I haven’t yet looked at the details of E7 switching, but I’m guessing any of the bank switching techniques that were used in the 1980s could be implemented with a ROM and SRAM plus a few 7400-series logic chips, or maybe a PAL. But the size and cost would probably be greater than emulating the RAM and ROM in a microcontroller.
A possible compromise might be using discrete RAM and ROM chips in the cartridge, along with a low-end microcontroller handling the bank switching instead of the 7400-series parts. The microcontroller firmware would be the same for any E7 bank switched cartridge, or any other particular switching scheme. But this approach would still be more expensive and more complex than emulating the whole thing in a high-powered microcontroller.
Here in 2023, it’s interesting how many of us set self-imposed limits over the ways we use retrocomputer or retro-gaming hardware. Everyone has their own rules and point of view, of course. But we all enjoy tinkering with these machines specifically because they’re old and have limited capabilities, and we recognize that some attempts to modernize the hardware might spoil the charm.
When I was talking to David Crane, he asked if I knew what the DPC chip stood for. I guessed Data Processing Chip, and he corrected me: DPC stands for David Patrick Crane.
I developed and built a game cartridge for the 2600 which supported 32K of RAM and 64K of ROM (flash, but not reprogrammable in circuit) with a rather nice banking scheme. I think Harmony’s the way to go, though, from a practicality standpoint even if one programs it to mirror something that would have been practical back in the day. Most cartridges for the 2600 that used an off-the-shelf OTPROM and 74xx chips would have needed a minimum of three 74xx chips to support bank switching, but a 74LS157 can do the job with one chip and an RC deglitching circuit, which I developed for Toyshop Trouble.
“Should games that target this augmented hardware still be considered Atari 2600 games?”
This is ultimately up to each person to decide for themself, but I look at it the same way I look at how mapper chips enhanced the NES during its lifetime. Nobody talked about how those games shouldn’t be considered NES games, despite adding tons of functionality that wasn’t included in the base console.
The latest retro hardware you describe basically makes the Atari a low-spec GPU if you’re running the code on the ARM. I think the interesting extensions are ones that could have been implemented back in the day. The bus-stuffing really speeds things up yet would have been possible.
I fully agree with the part: “For me, the appeal of the Atari 2600 comes from trying to make the most out of its limited hardware capabilities. By adding new 21st-century hardware with new performance-enhancing capabilities, it’s effectively moving the goalposts. I can compare two Atari games that target the original hardware, and if the first game looks visually more impressive than the second, I’ll know the first programmer did a better job.”
In this sense even 8k games moved the goalpost, even if just adding more ram. Comparison should be done among 4k ROM and 128 bytes RAM, and in this sense Pitfall!, Video Chess, and more recently PacMan 4K, other marvels like this have to be compared among themselves. Likewise 8K games are in a different category, and so 32k games, etc.