Optimization, or Distraction?
I think I may be getting too concerned about potential optimizations to the hardware design, before I’ve even built or simulated the initial design. A couple of possible optimizations occurred to me recently.
Improved CC Register: I described earlier how the condition codes are stored in a 4-bit shift register with parallel output, enabling the control circuitry to read all the flags simultaneously. Copying the condition codes to/from a register is done serially, requiring 4 clock cycles of bit shifting.
My latest improvement idea is to use a GAL to make a custom 4-bit register with two independent parallel load inputs, one connected to the ALU and one connected to the low 4 bits of the data bus. That would permit loading of all the condition codes from a stored value in a single clock cycle. The outputs of this GAL-register could also drive the data bus, either directly or through the ALU, making it possible to store all the condition codes in a single clock cycle as well.
These improvements would only benefit instructions that load/store the condition codes, though, which really only happens during interrupt processing, so maybe it’s not worth the effort. There are also a few problems regarding how to connect the GAL-register output to the data bus that I would need to resolve.
An Extra Data Register: A while ago, I considered adding a Y register to the machine, but ran into the limit of 8 possible load destinations. Now that I’ve increased the limit to 16, adding a Y register would be as simple as adding 2 more chips to store and drive the data. The required load enable and output enable lines are already there.
Unlike the A and X registers, the Y register would be connected to the right ALU input, which presents some problems. The T (temporary) register is connected to the right ALU input as well, which means it would be impossible to directly compute any functions of Y and T. For example, to add a constant value to X, the machine can load the constant into T, add X and T, and store the result where needed. But to add a constant value to Y, it would first need to copy X to T, load the constant into X, add X and Y, then restore the old value of X from T.
That would give the machine the odd property that operations involving Y are slower than those involving X and A. To be most useful, the proposed Y register really needs to be connected to the left ALU input, but that input is already “full”, and I don’t think I can relocate or remove any of the existing left inputs.
Too Many GALs? In my drive to optimize the design and reduce chip count, I’m beginning to wonder if I’m using too many GALs. I keep finding more and more places where two or more 7400-series parts could be replaced by a single GAL. For example, every combination of a ‘377 register and ‘244 output driver could be replaced by a single GAL, and the 16-bit pointer registers could be built from two GALs each, instead of four ‘569 counters. In fact, I could replace almost every chip with a GAL, except for the ALU, memory, field decoders, and a few drivers. In that case, looking at the schematic wouldn’t tell you anything at all about how the machine worked, and you’d have to read the GAL program data to understand anything.
I’m not sure I like that idea. Although it would still be a real hardware implementation, it would feel more like a simulation in many ways. It would also once again raise the question of why not just implement the bulk of the machine as a single FPGA?
For the sake of comparison, I estimate that using GALs as much as possible would reduce the component count to about 40, while not using GALs at all would increase the count to about 60.
Be the first to comment!No comments yet. Be the first.
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.