BMOW title
Floppy Emu banner

Nibbler 4 Bit CPU

Summary

Nibbler is a 4 bit CPU built from standard 7400 series logic chips – individual counters, registers, buffers, and gates. It’s an educational example of a simple CPU that’s easy to understand and build, but still capable of running games and other interesting programs. Nibbler is built using wire-wrapping on a 5 x 4 inch (127 x 102 mm) perfboard. The CPU contains 17 chips in total.

Download the Nibbler file archive containing all the design files and tools.

Follow the development history of Nibbler to learn more about the process that led to its construction.

 

Specs

  • 4 bit custom-made CPU
  • 12 bit addressing, 4K address space size
  • Harvard architecture – separate program and data memory spaces
  • 2.46 MHz clock speed, 2 clocks per instruction = 1.23 million instructions/sec
  • Thirteen 7400-series chips in the CPU data and control paths
  • Two 28C16 EEPROMs store microcode for generating internal control signals
  • 4K x 4 SRAM (CY7C168A)
  • 28C64 EEPROM stores programs
  • Four pushbutton inputs
  • 16 x 2 character LCD
  • Audio speaker
  • About 100 mA current draw at 5V

 

Demos

For a small pile of 7400 logic chips, Nibbler can do some cool stuff! Check out these videos of Nibbler running the games Mastermind and Frogger, and playing the Blue Danube Waltz. For lots more information about how these demos were made, see this post. Switch to the 480p quality setting for a bit more detail in the videos.

 

Architecture

The primary design goal was to make everything easy to understand, to encourage other people to learn from Nibbler and get inspired to build their own CPU. In several cases, obvious performance optimizations were intentionally omitted in order to keep things simple. However, Nibbler is not a true minimal CPU. With a 12-bit address space and a 16 x 2 character LCD display, it’s capable of much more than just blinking a few LEDs.

 

The architecture of Nibbler is shown above. The program counter (PC) provides the address for the program ROM. The program byte is combined with other status bits to provide the address for the microcode ROMs. These ROMs generate the internal control signals needed to load, enable, and increment the other chips in the CPU at the appropriate times. Each ROM outputs a different set of eight of the sixteen total control signals. The program byte is also used to construct addresses for RAM and for jump destinations, as well as to put immediate literal values on the data bus.

In contrast to most modern CPUs, the Nibbler design uses a Harvard Architecture. That means programs and data are stored in separate address spaces, and travel on separate busses. The data bus is 4 bits wide, as one should expect for a 4-bit CPU. The program bus is 8 bits wide: 4 bits for the instruction opcode, and 4 bits for the immediate operand or high bits of the target address.

Because instruction opcodes are 4 bits wide, there are 16 possible types of instructions. All instructions require exactly two clock cycles to execute. During the first clock cycle, called phase 0, the instruction opcode and operand are retrieved from memory and stored in a register called Fetch. The second clock cycle, called phase 1, performs the calculation or operation needed to execute the instruction.

A load-store design is used, with all arithmetic and logical computation results stored into the single 4-bit accumulator register named “A”. Data can be moved between A and memory locations in RAM, but otherwise all the CPU instructions operate only on A.

Program and data addresses are both 12 bits wide, resulting in total addressable storage of 4096 8-bit bytes for programs and 4096 4-bit nibbles for data. Since instruction opcodes are 4 bits wide, instructions involving absolute memory addresses are 4 + 12 = 16 bits in size, or two program bytes. Instructions involving only a 4 bit literal value are 4 + 4 = 8 bits in size, or one program byte.

Up to thirty-two distinct I/O devices can be supported by the design: sixteen inputs and sixteen outputs. However, Nibbler uses just one IN port and two OUT ports. The input comes from four momentary pushbuttons, arranged in a left/right/down/up configuration, and connected to the lone IN port. Output utilizes one of the two OUT ports, and includes a programmer LED used for debugging, as well as a piezo speaker for software-controlled sound, and a 16 x 2 character-based LCD display.

Limitations

Nibbler is notable for a few things it does NOT have. There’s no address decoder, because there’s not more than one chip mapped into different regions of the same address space. Program ROM occupies all of the program address space, and RAM occupies all of the data address space. I/O peripherals aren’t memory-mapped, but instead use port-specific IN and OUT instructions to transfer data.

Nibbler also lacks any address registers, which means it can’t support any form of indirect addressing, nor a hardware-controlled stack. All memory references must use absolute addresses. That’s a significant limitation, but it’s in keeping with the project’s K.I.S.S. design goals. With the use of jump tables and dedicated memory locations, Nibbler programs can implement a simple call/return mechanism without a true stack.

 

Schematic

Click the schematic image to view the full-sized version. The whole system fits on a single page, including the CPU itself and the I/O devices, so it’s easy to wrap your head around.

Except for RAM and ROM, all the chips shown here are common 7400 series parts. I used the 7400HCT logic family for most parts, which requires less power while offering similar speed to the more common 7400LS family.

Program Data

The parts on the schematic are arranged in the same relative positions as in the architecture diagram. At the middle-right is the program ROM, where the currently running program is stored. This is an 8Kx8 EEPROM, but Nibbler’s address size only allows for 4K programs, so one of the address inputs is unused and is hard-wired to 0. Program memory is 8 bits wide, and so all 8 of the ROM’s I/O lines are used. Depending on the type of instruction, these may be 4 bits of instruction opcode and 4 bits of immediate operand, or 4 bits of instruction opcode and 4 bits of address, followed by 8 more bits of address. At the start of execution of each instruction, this program byte is loaded into the Fetch register.

The address of the program instruction that’s currently being executed is stored in the program counter. The PC consists of three ’163 4-bit counters, chained together to make a 12 bit logical register. After most instructions, the PC will increment to point to the next instruction. For jump instructions, the PC can also be loaded with a new address.  The address comes from the Fetch register operand value (highest 4 bits) and the program ROM byte (lowest 8 bits).

Control and Microcode

At the top left of the schematic are the three chips pertaining to the execution of the current instruction. The Fetch register is a ’377, an 8-bit register that holds the current instruction opcode in the high 4 bits and instruction or address data in the low 4 bits. ALU flags are stored in the 4-bit Flags register, a ’173. There are only two flags, carry and zero, so two of the four bits are unused. The last chip in this group is a ’74, a dual flip-flop. One flip-flop is used to synchronize the reset signal, and another is the Phase bit, which constantly toggles between 0 and 1 to indicate which of the two clock cycles of an instruction’s execution is currently underway. Fetch is loaded at the end of the clock cycle when Phase is 0.

The instruction opcode, ALU flags, and phase are combined to form a 7-bit address for the two microcode ROMs, shown at the mid-left. The output of the two ROMs constitutes the 16 control signals needed to orchestrate the behavior of all the other chips. The microcode is stored in two 2Kx8 EEPROMs, so four of the eleven address inputs on each ROM are unused and hard-wired to 0.

ALU Datapath

At the bottom-left of the schematic are the ’181 ALU and the ’173 accumulator register “A”. The ALU (arithmetic and logic unit) can perform any common arithmetic or logical operation on its two inputs. In this case, one input always comes from the accumulator, while the other is supplied from the data bus. The ALU result is stored back into the accumulator. The ALU, accumulator, and data bus are all 4 bits wide, which is what makes Nibbler a 4 bit CPU.

Carry-In and Carry Flag

If you look carefully, you’ll see that the ALU’s carry-in bit is a control signal provided by microcode, not the carry flag from the Flags register. This is a subtle but important point: the carry flag is an output from an arithmetic instruction, and can be used to make a conditional jump if the carry flag is/isn’t set, but it doesn’t feed back into the ALU to affect later calculations. This means that when performing multi-nibble pair-wise additions, the program must check the carry flag after each nibble addition, and add an extra 1 into the next addition if it’s set.

This was a conscious design choice. If the carry flag did connect to the ALU’s carry-in bit, then the program would need to clear it before performing any single-nibble additions, and those are much more common than multi-nibble additions. Also the carry-in bit can’t simply be hard-wired to 0, because the CMP (compare) instruction requires carry-in to be 1 in order to work properly. So carry-in must be provided by the microcode.

RAM

RAM is shown at the bottom-center. Its I/O lines are connected to the data bus, and the address comes from the Fetch register operand value (highest 4 bits) and the program ROM byte (lowest 8 bits). It’s a 4Kx4 SRAM, to match Nibbler’s address size and data width.

Notice that the CLK signal is OR’d with the /CSRAM control signal, with the result connected to the RAM’s /CS (chip select) input. This means the RAM will only be enabled during the second half of each clock cycle, and even then only when the microcode specifically enables it. This is a simple way of preventing erroneous writes to RAM during the early part of the clock cycle, when the /WE (write enable) signal and RAM address may not yet be valid.

IN and OUT Ports

The IN and OUT ports are also connected to the data bus, and are shown on the schematic at bottom-right. IN0 is a ’125 4-bit bus driver, which outputs the state of four pushbuttons connected to pull-up resistors. Because there’s only a single IN port, no decoding of the port number is done, and this ’125 will actually respond to any port number with the IN instruction. If more IN ports were added, then additional port number decoding logic would be needed.

The two OUT ports are ’173 4 bit registers. OUT1 connects to databus[4..7] of a 16×2 character LCD display using the common HD44780 controller. Although this LCD controller has an 8 bit interface, it can also operate in 4 bit mode, in which case only the highest 4 LCD databus lines are used. OUT0 connects two more lines to the LCD, for the RS and E signals needed to control LCD data transfers. The other two lines from OUT0 connect to an LED, which can be toggled on/off as a basic debugging aid, and to a speaker, which can be bit-banged in software to generate simple square-wave tones at different frequencies.

Notice that the ’173s have two load enable inputs, /G1 and /G2, and both must be low in order to load data to the chip. /G1 of both chips is connected to the /LOADOUT control signal. But as with the IN port, the OUT port number is not fully decoded, in order to avoid needing extra decoding logic. Instead, bit 0 of the port number is connected to OUT0 /G2, and bit 1 to OUT1 /G2. This means that OUT0 will actually respond to any port number where bit 0 is 0, and OUT1 to any port number where bit 1 is 0. It would even be possible to load both OUT ports simultaneously by using a port number where both bits 1 and 0 were 0, although that probably wouldn’t be useful.

Bus Drivers

The last two components on the data bus are a pair of 4-bit bus drivers, shown at the center and at the bottom-center of the schematic. These are two halves of a single ’244 octal driver. One drives the ALU result onto the data bus, which is necessary when storing data to RAM or an OUT port. The other drives the operand value from the Fetch register onto the data bus, which is necessary for instructions that involve an immediate constant value.

Combinatorial Feedback Loops

If you looked carefully at the architecture diagram and the schematics, you may have noticed there’s a combinatorial feedback loop from the ALU output, through bus driver B, and back to the ALU’s B input. If the ALU function were set to something like A + B while bus driver B was enabled, it’s obvious that a feedback problem would be created, as the ALU’s output would also be its input. What’s less obvious is that a feedback loop can be created even if the ALU function is set to A, and the value at input B should be logically irrelevant. Given a closer look at how logic gates are constructed from transistors, you can demonstrate that logically irrelevant doesn’t always mean electrically irrelevant, and a transient invalid voltage on the data bus can lock the ALU into a vicious cycle where invalid input results in invalid output which causes more invalid input. This was a rare problem with the first version of Nibbler, and was discussed in more detail here and here, and finally resolved here.

The feedback loop fix was made in microcode, and relies on knowledge of the internal logic of the SN74LS181, taken from the chip’s datasheet. While unlikely, it’s theoretically possible that a ‘181 from a different logic family or different manufacturer might have a different internal logic for which the microcode-based fix doesn’t work. If you’re building a Nibbler-clone and run into this problem, a Schmitt Trigger buffer like the 74HCT7541 can be placed between the data bus and the ALU’s B input.

 

Microcode and Instruction Set

Instructions for the Nibbler 4-bit CPU come in two types: immediate and addressed. Both types begin with 4 bits of instruction opcode, to identify the specific instruction. Immediate instructions like ADDI include a 4-bit operand value embedded in the instruction, and so are 8 bits in total size. Addressed instructions like JMP include a 12-bit address in data space (RAM) or program space (ROM), and are 16 bits in total size. The instruction encoding is very simple:

The four instruction opcode bits i[0..3] are combined with the ALU carry flag C and zero flag Z, as well as the Phase bit, in order to form a 7-bit address for the two microcode ROMs. The ROMs’ outputs form the 16 control signals needed to orchestrate the behavior of all the other chips. The contents of the microcode ROMs are shown in the table below.

The first line of the table shows that phase 0 is the same for all instructions. That’s good, because at this point the instruction hasn’t been fetched from memory yet, so the CPU doesn’t even know what instruction it’s executing! The microcode drives the contents of the accumulator onto the data bus, even though they’re not needed – CMOS chips can exhibit problems if the bus is left to float. Also during phase 0, hardware loads the Fetch register with the program ROM byte, and the microcode advances the program counter to the next byte.

Phase 1 is where all the interesting work happens. ADDI is a good example of a typical instruction. Its instruction opcode is 1010 binary, or $A hex. For this instruction, the same control signals will be asserted regardless of the C or Z flags. /oeOprnd will be 0, enabling the bus driver to drive the immediate operand value onto the data bus, where it connects to one of the ALU inputs. The other ALU input is always connected to the accumulator register. The /carryIn, M, and S[0..3] control signals will be set to put the ALU into addition mode, with no carry-in. /loadA will be 0, so the ALU result will be stored to the accumulator. /loadFlags will also be 0, so the carry and zero flags will be updated with the results from the addition.

JZ (jump if zero) is another good example. Its instruction opcode is 1000 binary, or $8 hex. If the Z flag is 0 (meaning true), then incPC will be 0 (PC will not be incremented) and /loadPC will be 0 (PC will be loaded with the new address). So if Z is 0, the jump will be taken. If the Z flag is 1 (meaning false), then incPC will be 1 (PC will be incremented) and /loadPC will be 1 (PC will not be loaded). So if the Z flag is 1, the CPU just skips over the destination address byte and advances to the next instruction, without taking the jump. Jump instructions don’t need to involve the ALU, but they drive the accumulator value onto the data bus, so the bus won’t be left floating.

Note that some ALU control signals are irrelevant for certain instructions, and are shown in gray and labeled “dc” for don’t care.

Instruction Set

With only 4 bits for the instruction opcode, there’s a maximum of 16 different instructions, so the choice of which instructions to implement is challenging. It would be possible to have more than 16 instructions, but only by shrinking the address size or immediate operand size. Since this is a 4-bit CPU, a 4-bit instruction size just makes sense.

There are five types of jump instructions:

JMP – unconditional jump
JC – jump if carry
JNC – jump if no carry
JZ – jump if zero
JNZ – jump if not zero

This is a good set of jump instructions for convenient programming. Although they’re convenient, the negative jumps JNC and JNZ aren’t absolutely necessary, because you can always rewrite them with a positive jump and an unconditional jump:

JNC noCarry
carry:
   do stuff
noCarry:
   do other stuff

becomes

JC carry
JMP noCarry
carry:
   do stuff
noCarry:
   do other stuff

LD and ST load a nibble from data space (RAM) to the accumulator, or store a nibble from the accumulator to data space.

LIT loads a literal immediate nibble value to the accumulator. It might also have been named LDI.

OUT stores a nibble from the accumulator to one of the sixteen output ports, while IN loads a nibble from one of the sixteen input ports to the accumulator. These instructions are intended to be used for I/O.

ADDI adds a literal immediate nibble value to the accumulator, while ADDM adds a nibble in data space to the accumulator. In either case, the carry-in for the addition is 0, and the carry-out and zero flags are set by the results of the addition.

CMPI compares a literal immediate nibble value to the accumulator (subtracts the value from the accumulator), while CMPM compares a nibble in data space to the accumulator. The accumulator value is not modified, but the carry-out and zero flags are set by the results of the subtraction. Because the ALU is performing a subtraction, the carry-in is always 1 in order to make the math work out correctly.

NORI performs a NOR with an immediate value, NORM performs a NOR with a data space value. The carry-out and zero flags are set by the results of the NOR.

Synthetic Instructions

What about the instructions that aren’t here, like subtraction, AND, OR, NOT? Happily, all of these can be synthesized from the existing instructions. The assembler I’ve written includes some of these as macros, so they can be used as if they were built-in instructions.

NOT: NORI #0
ORI x: NORI x; NORI #0
ORM x: NORM x; NORI #0
ANDI x: NORI #0; NORI ~x
SUBI: ADDI (~x+1)

The only common instructions that can’t be trivially synthesized are ANDM and SUBM, because they require the use of a temporary register to hold the complement of the data space value. A dedicated location in data space could be used for this purpose, since Nibbler only has one register.

In many cases, with some thought you can eliminate the need for these synthetic instructions entirely. Take the common case of checking a specific bit in an input. On a CPU with an ANDI instruction, this might look like:

#define BIT2 $4 ; $4 = 0100
IN #0 ; load accumulator with nibble from port 0
ANDI BIT2
JNZ bit2is1

With Nibbler, you might think to rewrite this using the synthetic ANDI mentioned above:

#define NOT_BIT2 $B ; $B = 1011
IN #0 ; load accumulator with nibble from port 0
NORI #0
NORI NOT_BIT2
JNZ bit2is1

But this can be shortened by simplifying the two negatives NORI and JNZ into a single positive JZ:

#define NOT_BIT2 $B ; $B = 1011
IN #0 ; load accumulator with nibble from port 0
NORI NOT_BIT2
JZ bit2is1

 

Software Tools

I created several software tools to help make Nibbler development easier. Most were written in .NET languages like C#, using the free Visual Studio Express IDE for Windows.

Microcode Builder

The simplest tool is the Microcode Builder, which generates two binary files for the contents of the two microcode ROMs. In the beginning I intended to describe the microcode operations in a text file, using register transfer language, and then the Microcode Builder would assemble it into binary. But when it became clear the microcode would be fairly simple, I dropped the idea of using RTL, and just wrote some C code to generate the required binary data directly. The program contains lots of bit-shifting fun like:

    // 9 JNZ
    for (int c = 0; c < 2; c++)
    {
        microcode[(9 << 3) | (c << 2) | (0 << 1) | 1] = 0xF03F; // Z
        microcode[(9 << 3) | (c << 2) | (1 << 1) | 1] = 0x303F; // NZ
    }

Those hexadecimal numbers are just rows from the microcode table, with the 16 control signals expressed as a 4-nibble hex value.

Assembler

The Nibbler assembler is used to write programs for the CPU, using symbolic instruction names and branch labels. It was originally written for my Tiny CPU project, then adapted for Nibbler’s purposes. The assembler runs as a command line program. It takes a single .asm file as input, and assembles the code into a .bin binary output file. It also generates a .sym symbol file, containing the values of all the labels and constants in the code, as well as the assembled address of each line of code. The symbol file is used later by the simulator, in order to perform source-level debugging.

The assembler doesn’t have any fancy features like macros or conditional compilation, but it does support:

  • Decimal constants
  • Hex constants, preceded by $
  • Character constants, contained in ‘ ‘ single quotes
  • < and > operators to extract the high or low nibble of a byte constant
  • Named constants using #define (both data and address constants)
  • Named labels, and jumping to a named label
  • Unnamed + and – labels, for jumping forward and backward
  • Comments, starting with ;

For a good example of the kind of code the assembler can handle, look at the source for my guess the number program (old version).

Simulator

The most complex tool is the machine simulator. Originally developed for BMOW 1, this GUI-based tool simulates the data and control paths of the CPU. It supports source level symbolic debugging, disassembly, microcode debugging, code breakpoints, data breakpoints, memory inspection, and I/O simulation of the LCD and input buttons.

The simulator uses the program binary, the program source and symbol files (if available), and the microcode ROM binaries. In the image above, the simulator is doing source level debugging of a button watcher example program.

Execution is currently stopped at the CMPM PREV_BUTTON_STATE instruction, as shown by the yellow arrow. There’s a breakpoint two instructions further down, at the line with the red circle. If the simulator is started and a simulated button is pressed, the program will break out of the loop and stop at the breakpoint. (The JE instruction in the image was later replaced by JZ.)

 

Photos

 

77 comments 

77 Comments so far

  1. MichaelM - October 18th, 2013 12:47 pm

    Thanks for posting your project. I tend to work mostly with FPGAs, but I certainly appreciate the thought and effort you put into this project. It is a great demonstration of the power of small instruction set processors when applied with some ingenuity.

    You may be interested in the Windows tool to which I link below. I use it to create ROMs and such for my projects. I use it, in particular, to program my microcoded microprocessors and algorithmic state machines: https:

    //github.com/MorrisMA/MAM65C02-Processor-Core/blob/master/Utils/SMRT_tool.zip

    In that GitHUB repo you’ll also find the source code for my 65C02 processor’s microprogram and instruction decoder. In the Utils subdirectory is a short description of the capabilities/features of my microprogramming support tool. That description and the source code should be sufficient to get you started, if you find a need for an existing tool for future projects.

    Once again, thanks for the post. I look in on your site periodically, and I always learn something.

  2. Steve Chamberlin - October 19th, 2013 9:25 am

    Sounds interesting, I’ll check it out. Thanks!

  3. Bryan G. - October 14th, 2014 6:04 pm

    I really liked this post, I not ready to build a nibble
    yet but I really want it.This site is being very usefull for me
    I have been learning a lot of thing…Thanks

  4. William - May 23rd, 2015 4:03 pm

    I am attempting to build my own Nibbler. What part are you using for the speaker?

  5. Steve Chamberlin - May 23rd, 2015 4:16 pm

    It’s a generic piezo, similar to this one: http://www.adafruit.com/products/160

  6. Dietmar Rieger - October 12th, 2015 4:43 pm

    This is a super dream project.Back to the basics! This project is much better then raspberry!

  7. Anonymous - October 13th, 2015 2:47 am

    I love that you wire wrapped it, those are beautiful.

  8. Hristo - October 13th, 2015 4:40 am

    Very cool!

  9. hardbreaker - October 13th, 2015 7:00 am

    Nice project. Inspiring. What I have seen from the schematic is that you have designed the power plug with the center pin having ground. I know that there is no real standard, but I try to have always ground outside in my designs like with HF cables with shield functionality.

    Fairly all of my notebook power supply have ground outside …

    Just to mention. Keep up the good projects 🙂

    /HB

  10. Steve Chamberlin - October 13th, 2015 7:11 am

    It’s been a while since I did the design, but I thought power was the inside surface and ground the outside surface of the plug, for exactly the reason you gave. I may have messed up the diagram.

  11. Newbie - November 18th, 2015 4:12 pm

    Hi Steve! Very interesting project. I’m thinking on build one of these, but I’m very noob on low-level programming and I doubt if I can buil it. What’s the background sciences necessary to assemble the Nibbler? I’m interested in how can I program de CPU. Serial comunication?

  12. Steve Chamberlin - November 19th, 2015 7:37 am

    The programming method is very primitive. Remove the EEPROM chip, put it in an external programmer to update its contents, and then replace it on the Nibbler board. At least it’s better than using punched cards!

  13. Newbie - November 19th, 2015 8:11 am

    Many thanks Steve!

  14. MikeG - November 30th, 2015 3:16 pm

    Hi Steve,
    I wanted to build my own CPU out of logic chips for a long time, but was not ready to handle the complexity of the build. While I understood the theory, the practical aspects of putting it together seemed challenging. With your schematics and explanations of the design, it turned out to be very doable. It took me two weeks to build it and I learned a lot in the process. The next objective is designing and building my own CPU.
    I looked at a number of four bit CPU designs and I think yours is the best.
    Thank you for the effort you put into the design and the useful software and the demos!

    Here is the pic of my completed project which I managed to fit onto one breadboard.
    http://flic.kr/p/AKbAEK

  15. MikeG - December 14th, 2015 8:43 pm

    When I first ran the demo programs, I thought the Nibbler needed some word games.
    So I wrote a couple.
    The first game is “Word Scrambler”, the objective is to unscramble the secret word.
    The second game is “Hangman” where you guess the secret word in a limited number of guesses.
    I blatantly reused your code for LCD communication and small things like random number generation and button debounce.
    Both games rely on a word dictionary which is difficult to implement on the Nibbler due to lack of indirect addressing and/or dedicated data memory. To increase the dictionary size, I restricted the alphabet to the first 15 letters and used a simple suffix merge compression scheme. Also had to sacrifice sound effects.
    With this approach I can store 200 5-letter or 170 6-letter words (Word Scrambler uses 5-letter dictionary and Hangman uses 6-letter words). Don’t know if more efficient compression is feasible.

    Here is a link to sources, binaries, dictionary generator program and pictures:
    https://drive.google.com/open?id=0B-fnI9Vn7gNLdmF5Z1RwUzNhUGs

    Finally I want to mention that the assembler is really good and the simulator is excellent.

  16. Steve Chamberlin - December 14th, 2015 8:46 pm

    Wow, that’s fantastic MikeG! I will definitely try out those new demos.

  17. Anurag - December 16th, 2015 4:36 am

    Hey it is an awesome project can u just tell me how did u make ur vga output encoder and address and keyboard input decoder

  18. Steve Chamberlin - December 16th, 2015 7:50 am

    Nibbler doesn’t have VGA video nor a keyboard, just a 2-line LCD and 4 pushbuttons. Maybe you’re confusing it with http://www.bigmessowires.com/bmow1/ ? You can find the schematics for its video and keyboard systems on that page.

  19. Anurag - December 20th, 2015 3:34 am

    Hey Steve can u tell me the detailed part list for it pls
    And the bmow also
    Ps I’m only 12

  20. Mike.. - December 21st, 2015 5:25 pm

    Steve,

    When programming the bin files into the roms, (like say the guess-the-number BIN file), do all 3 roms get the programmed with the same bin to work correctly??

    Thanks…

  21. Steve Chamberlin - December 21st, 2015 5:57 pm

    The two microcode ROMs implement the CPU’s internal control system, and only need to be programmed once. The files are named microcode_0.bin and microcode_1.bin. Only the program ROM needs to be changed when switching between Frogger, Guess the Number, etc.

  22. Mike.. - December 21st, 2015 6:27 pm

    Thanks Steve.. I overlooked the two bin files for the micro roms in the other folder. Thanks for the quick reply…

  23. Anurag - December 22nd, 2015 6:27 am

    A last question what are the capacitors for in the schematic at the right side of the schematic

  24. Steve Chamberlin - December 22nd, 2015 8:12 am

    Those are decoupling capacitors. In the physical circuit, one is placed between the power and ground pins of each chip. It’s just easier to show them all in one spot for the schematic. http://electronics.stackexchange.com/questions/2272/what-is-a-decoupling-capacitor-and-how-do-i-know-if-i-need-one

  25. Mike.. - December 22nd, 2015 10:00 am

    Hello again Steve,

    I have a 20X4 Character LCD display instead of the 16X2 you used. Does anything need to be changed for this diplay to work or should it work without any changes needed??

    Thanks…

  26. Steve Chamberlin - December 22nd, 2015 11:52 am

    You’ll probably have to make some minor changes to the programs to accommodate the different size of display.

  27. MikeG - December 22nd, 2015 2:03 pm

    Decoupling capacitors were always a mystery to me. In the breadboard version of the Nibbler I did not use any bypass capacitors and everything worked just fine. I added them in in the PCB version and can’t tell any difference.

  28. MikeG - December 22nd, 2015 2:30 pm

    Anurag, here are the parts I used for the Nibbler. Part numbers are from jameco.com.
    This does not include resistors and capacitors.
    I used 28C64 instead of 28C16 for the microcode and CY7C199 instead of CY7C168 for RAM.

    Part# Description Qty Price

    2210677 JACK,DC PWR,2.1×5.5mm,MALE, 1 .6900
    160882 HEADER,ST MALE,1RW,40PIN 1 .7500
    27924 CRY,OSC,2MHz,100ppm,5V@30mA, 1 1.4900
    74827 IC,EEPROM,28C64A-15/25,5V 3 3.7500
    104003 SOCKET,ZIF,28PIN,TIN CONTACT 1 9.9500
    242376 IC,SRAM,CY7C199-35PC 1 1.9500
    913299 IC,74HCT32N,IC,14-PIN DIP PKG 1 .4900
    46501 IC,74LS125N,DIP-14, 1 .6500
    45022 IC,74HCT244N,DIP-20,TRI-STATE 1 .3500
    46922 IC,74LS173N,DIP-16,4BIT D-TYPE 4 1.1900
    46842 IC,74LS163N,DIP-16, 3 .6900
    46973 IC,74LS181N,DIP-24,ARITHMETIC 1 3.9500
    47669 IC,74LS377N,DIP-20, 1 .9900
    45137 IC,74HCT74N,DIP-14,DUAL D-TYPE 1 .3900
    2118580 DISPLAY,LCD,16×2,PARALLEL,5V 1 9.9500
    1874543 TRANSDUCER:AUDIO 1 .7900
    1157915 Resistor Trimmer 10k Ohm 25% 1 .35
    155380 SWITCH,PB,TACT,OFF(ON) 5 .4500
    34761 LED,GRN,GRN DIF,T1-3/4, 2 .1500
    36038 DIODE,SWITCH,1N4148 1 .0500

  29. Steve Chamberlin - December 22nd, 2015 5:17 pm

    Here’s another good article on decoupling capacitors: http://hackaday.com/2011/10/25/do-you-know-why-youre-supposed-to-use-decoupling-capacitors/ You won’t notice any difference in operation with/without the capacitors, until suddenly the circuit starts misbehaving with weird glitches when no capacitors are present. On an oscilloscope trace, you should clearly be able to see the difference when decoupling capacitors are present.

  30. MikeG - December 22nd, 2015 5:33 pm

    Thanks for the pointers Steve! The oscilloscope pics are very telling.

  31. dammy - March 22nd, 2016 2:59 pm

    Very Cool Project. Ive Embarked on making d flip-flops for ram which is turning up, My question is that can i make all of this with components like transistor & resistor & program the machine code on my own w/ any ic?

  32. William - March 22nd, 2016 4:34 pm
  33. Michal - June 2nd, 2016 5:37 am

    Great project! I´ve built Nibbler, I like these uncommon designs. It was no problem to purchase parts except Cypress RAM, which seems to be obsolete and is not available in any local parts stores. But there was one small reminiscence, there were these long narrow chips somewhere. I´ve checked old mainboards and voila! In 486 era compatible SRAMs were used as cache memory. My one is marked UM61M256K-15 and works fine. The only difference in datasheets comparison is different order of address pins, but it is not important parameter, address pins marking order on RAM chips is just for transparency, not because of technical reasons.

  34. Steve - June 2nd, 2016 6:36 am

    Yes, the 4-bit SRAM is difficult to find. I’ve been considering making a Nibbler kit, which would probably have an 8-bit SRAM and just leave the extra bits unused.

  35. Bone Doctor - June 16th, 2016 8:04 am

    Dear Steve, thank you for your excellent work and share. I realised your machine and Works well. As you know, in this circuit RAM Works as work registers not a real RAM.
    so that I cannot implemented serial programming or bootloader.
    Do u have any idea do add RAM to programming area. Better; How we can design a Von neumann machine?

    Thank you.

    Dr. Peker (M.D.)
    Sogapalag islands

  36. Steve - June 16th, 2016 9:13 am

    The simplest method for indirect RAM references would be to add 3 more OUT registers, to form the low/middle/high nibbles of a 12-bit address register for RAM. I’ve explored that idea a little, but the details would quickly grow complicated- to make it really practical I think you’d also need to introduce additional instructions and microcode too. That still wouldn’t work for a bootloader, however, since by design program code is in a separate memory space from data.

    If you want a von Neumann machine, it’s probably best to look at another CPU. My Tiny CPU (http://www.bigmessowires.com/cpu-in-a-cpld) is von Neumann, but is not very well documented. Maybe I will try creating a simple von Neumann design someday.

  37. Bone Doctor - June 16th, 2016 10:06 am

    I will start to design a 4 bit van neumann machine if u support in microcode and software.

  38. William - September 22nd, 2016 6:25 pm

    I want to try to write new tunes for Nibbler.

    Is Visual Studio Community 2015 with Update 3 – Free Edition what I need to run the source code for Nibbler Music Maker?

    https://www.visualstudio.com/downloads/download-visual-studio-vs

  39. Steve - September 22nd, 2016 8:24 pm

    I used an older version, but I think that should be fine.

  40. Sven - September 29th, 2016 12:13 am

    can reduce the number of chips. It is known that the binding of the ALU and the decoder – 5 chips. It can be replaced with a ROM-RAM 8 – 128 kilobytes. bad idea? sorry for my bad english.

  41. Steve - September 29th, 2016 6:48 am

    Which 5 chips do you mean specifically? There are no decoders used in the Nibbler design. Yes, you could replace the ALU with a ROM, but it wouldn’t reduce the total number of chips.

  42. Sven - September 30th, 2016 2:28 am

    28C16 (x2), 181:

    ALU uses a 6-pin connector. I would like to combine the three into a single chip. from ROM – 8 pins, 4 control the ROM 4 data, but if> 8 kb, input terminals 12>, 4 can be used as input data. if the remaining 4 control – is not enough, use a decoder (138 or 238)

    Contact Bounce: You can do without the chips. I often use a high pass filter capacitor and two resistors. From experience, I would say that without the filter, the comparator performs worse than D (flip flop?)- trigger.

  43. Mark - October 7th, 2016 6:41 pm

    This is probably the nicest ‘homemade CPU’ project I’ve seen yet.

    For anyone wanting to do their own version, I encourage you to make it as extremely simple as possible, even simpler than above, such as by ALSO doing the following:

    -very minimal instruction set (like 5 instructions)

    By doing this, it should be possible to make 15 TTL-chip version, no EPROMs required! (other than one for program code)

    I did a CPU this way, but cheated a bit with a PAL and the whole thing was only 7 chips. It was complete with I/O decoding for LCD and keypad just like the above and you could play a game on it.

  44. Pekkers - November 15th, 2016 1:53 pm

    I love your work but I must comment on your schematic, it is hard to read when your bus merges don’t have a dot or anything so it looks like they are just crossing. 🙁

  45. Damilare - November 18th, 2016 12:36 pm

    With the available instructions of this CPU, is it possible to make a simple as possible OS?

  46. Jason - December 20th, 2016 12:54 am

    What a Beautiful piece of Artwork! I am on a vintage computing nostalgia trip, and your site has re-invigorated me to try a 4 bit CPU in CPLD, Congratulations!

  47. Damilare - September 26th, 2017 10:48 am

    I stumbled upon a .asm file on github containing different examples of what nibbler can do, but I couldn’t find the file any longer. Dave, please can you give me the link to the examples asm file?
    What I want to review is the part that deal with chain calculation.

  48. Bryan Chan - November 19th, 2017 8:48 pm

    Here\’s my implementation of this project in SystemVerilog, in case you wanna play with it 😉

    https://github.com/bchangip/NibblerCPU

  49. Ray Allen - December 20th, 2017 10:50 am

    Great Project.

    Would pull-up resistors on the data bus be another way to solve the B input to ALU with floating bus issue?

  50. Ray Allen - December 20th, 2017 11:07 am

    Appears that RAM is not initialized by a reset.
    I guess you need to account for that when programming, right?

  51. Ray Allen - December 21st, 2017 10:35 am

    Upgrade options (not that I’d really do it)?

    I was just wondering (from a purely conceptual viewpoint) how one might improve on the “hardware limitations”. The page mentions true stack and indirect addressing and a way to do indirect addressing.

    For the stack, could one just add in a few more sets of 74HCT163 chips, three for each stack depth? Guess you’d need one more Microcode ROM chip to control the enable lines of these chips.

    Or, is it more complicated than that?

  52. daniele - December 2nd, 2018 7:01 am

    Very nice work, well done!!
    Just one question: i don’t fully understand how do you perform memory jump. I mean, your data bus is 4 bit wide, but memory requires 12 bit addressing; how does it work?

  53. Steve - December 2nd, 2018 2:43 pm

    Take a look at the architecture diagram. Addresses for jumps travel on a separate 12-bit bus, not on the data bus.

  54. Daniele - December 5th, 2018 12:12 pm

    Yeah, I’m sorry, I misspoke! 🙁
    I try to explain better.
    I’ve seen that all the jump instructions take up 2 different “lines” of memory, one for the opcode and the first nibble of the address and another one for the second and the third nibble. This means that you have to read memory in 2 times in order to get the whole address, doesn’t it?
    This is what i’ve understood up to now. However, reading the microcode of the JMP instruction i just see that the loadPC is immediately enabled, after reading only the first nibble of the address… where am i going wrong?
    I’m a novice, i hope you’ll understand my doubt
    Thanks for your patience!

  55. Steve - December 5th, 2018 12:46 pm

    I suggest downloading the simulator and stepping through an example program one clock cycle at a time, it will make things clearer.

    “reading the microcode of the JMP instruction i just see that the loadPC is immediately enabled, after reading only the first nibble of the address…”

    I’m not sure what you’re looking at, but that’s not correct. loadPC is asserted on the second clock cycle (phase = 1) of a jump instruction. The behavior during the first clock cycle is the same for all instructions, as shown by the first line of the microcode table above (phase = 0). During phase 0, hardware loads the Fetch register with the program ROM byte, and the microcode advances the program counter to the next byte.

  56. Ahmed Gamal Fathalla - December 8th, 2018 11:30 am

    wth this is much complicated than i thought i just looked on the first pic which is some ics tied toghether and just saved the link to excute it as my computer arch project lol now the project deadline is 1 week from now i can\’t understand shit from this 😀 oh my gosh

  57. Rado - February 6th, 2019 2:11 am

    Hello,
    how does it work, that during for example ADDI execution the ALU output (being immediately loaded into the Accumulator) does not feed back through the Accumulator to the ALU, changing its output … and looping like this?

  58. Steve - February 6th, 2019 7:56 am

    The accumulator is not immediately loaded. It’s an edge-triggered register, not a latch.

  59. William Ahlgren - February 10th, 2019 11:45 am

    Has anyone developed a Nibbler using 4000-series CMOS components?

  60. William Ahlgren - February 10th, 2019 11:54 am

    I meant to ask: Has anyone developed a Nibbler using only CMOS components? Could be 4000-series or 74C, 74HC, etc.

  61. Damilare - February 11th, 2019 4:22 am

    I haven’t done that before, but you can do it with CMOS family too.
    You just have to use the appropriate substitute of the 74xx family.

  62. JAPENASA - June 7th, 2019 8:51 am

    excellent project

    OK.COLOMBIA

  63. Oliver - November 3rd, 2019 1:47 pm

    Why not using A=B output of the 74181? Is would be Z instead of /Z but this would only affect the microcode, right?

  64. Abu Bakar Siddique - November 30th, 2020 6:27 am

    Thank you for putting the hard work first in designing the whole thing and then presenting it in a nice way with all the material. I want to build it myself.

  65. Copin - February 27th, 2021 7:56 am

    No need 4K of RAM.

  66. Copin - March 26th, 2021 9:52 am

    2Kx16 rom
    128×4 ram

    NOP,RTI,RTS LD,ST Ri,R+ LPM bnk,R+,R JMP,JSR XXX JEQ,JNE,JCC,JCS xXX

    SHL,ASR,LSR,INC,DEC,NEG,ROR,ROL,DA,COM,ADC0 Ri,xX xX LD,CMP,TST Ri,#X

    AND,OR,XOR,ADD,ADC,SUB,SBB Rd,Rs,#X MOV,OR,AND,TST,CMP #X,xX LD R,#XX

    OR,AND,XOR,MIN,MAX,ADD,ADC,SUB,SBB R1,xX,d LD,ST,CMP,TST,SWAP Ri,xX

  67. Copin - April 15th, 2021 10:30 pm

    4096B rom
    256×4 ram

    NOP,RTI,RTS LD,ST A1,A0+ LPM bnk,A+,xX JMP,JSR,JEQ,JNE XXX JCC,JCS xXX

    SHL,ASR,LSR,INC,DEC,NEG,ROR,ROL,DA,COM,ADC0,TST Ai LD Ai,#X MOV,TST #X,XX

    MOV,OR,AND,XOR,ADD,ADC,SUB,SBB Ai,XX,d TST,CMP,SWAP Ai,XX BTST An LDA #XX

  68. Copin - April 26th, 2021 10:59 pm

    NOP,RTS LD,ST,CMP X,XX MOV,TST #X,XX JMP,JSR,JEQ,JNE XXX

    SHL,ASR,LSR,INC,DEC,NEG,ROR,ROL,DA,COM,ADC0 XX LD X,#X

    OR,AND,XOR,ADD,ADC,SUB,SBB,TST,CMP X,#X X,X MOV,SWAP X,X

  69. Edson - December 8th, 2021 7:25 am

    Hi, very interesting project!

    Could you provide an example of how to implement a simple call/return mechanism without a true stack ?
    I can’t realize how can I do it.
    Thank you.

  70. Steve - December 8th, 2021 8:22 am

    See the example programs. The code stores a return index number in a dedicated memory address before calling a subroutine, and the subroutine uses the number in a switch statement to determine where it should jump back to.

  71. Edson - December 8th, 2021 5:16 pm

    I saw, thank you for share, Steve!

  72. Paul Schmitt - January 1st, 2022 11:28 am

    Thought you might enjoy the – a post I recently made on LinkedIn regarding a 4-bit machine lab done in 1979.
    https://www.linkedin.com/in/paul-schmitt-mass/detail/recent-activity/shares/

  73. mikeh - December 5th, 2022 7:00 am

    Pretty sweet! What modifications would be needed to get it to run Doom!

  74. Simon - January 28th, 2023 8:38 pm

    Is it possible to simulate this project in Intel Quartus lite?

    Thanks.

  75. Jacob Lister - July 8th, 2024 7:34 pm

    @Simon – VHDL – someone’s done it here by the looks:

    https://gist.github.com/erincandescent/347577465129882abc97

  76. Bruce - August 18th, 2024 9:51 am

    Nibbler is a very nice project!
    Another good project I discovered is mikroleo, a 4-bit microcomputer.

  77. SAM - January 16th, 2025 8:11 am

    My friend, great effort and beautiful design! I’ve been following your amazing website. However, do you have an analog schematic with more details about BUS lines? Thank you for your cooperation!

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.