Verilog Headaches
I’m having some trouble finding the best way to structure the Verilog code for this CPU. In particular, I’ve encountered one small headache and one larger one.
The small headache relates to the best way to describe complex combinatorial logic that doesn’t involve any registers. Consider some hypothetical logic that determines the value of the incrementPC and loadA control signals, based on the current state. One way to do this would be:
wire incrementPC, loadA;
assign incrementPC = (state == s1) || (state == s3) || (state == s4);
assign loadA = (state == s0) || (state == s2) || (state == s4);
That works fine, and it’s pretty clear what it does. But for more complex designs, it’s clearer to use procedural assignment and a case statement, grouping all of the control signals for each state together:
reg incrementPC, loadA;
always @* begin
case (state)
s0:
incrementPC = 1'b0;
loadA = 1'b1;
// other control signals...
s1:
incrementPC = 1'b1;
loadA = 1'b0;
// other control signals...
s2:
incrementPC = 1'b0;
loadA = 1'b1;
// other control signals...
s3:
incrementPC = 1'b1;
loadA = 1'b0;
// other control signals...
s4:
incrementPC = 1'b1;
loadA = 1'b1;
// other control signals...
endcase
end
The problem with this approach is visible in the first line: incrementPC and loadA must be declared as type “reg”, even though they are not registers. During synthesis, no register will be created as long as your code is correct, but Verilog demands that the target of a procedural assignment like this always be type “reg”. So reg does not always mean that something is a register. I find this very confusing and misleading, because it means you can’t just look at the Verilog code to see which signals are registers and which are purely combinatorial.
My bigger problem is more subtle, and is about good HDL design practices rather than any quirk of the Verilog standard. I’m unsure how explicit I should be in defining the structure of the virtual hardware described by the Verilog code. At one extreme, I could write a high-level functional description of *what* the CPU does, ignoring *how* it does it, and leave the Synthesis software to figure it out. Or at the other extreme, I could work out a block diagram of the CPU consisting of familiar real-world elements like registers, arithmetic unit, muxes, and busses, and then write Verilog code to describe these elements and how they’re all connected.
To help make this distinction clearer, here’s an example based on section 6.2.4 of the book FPGA Prototyping by Verilog Examples. Imagine a state-driven system that can add two input registers, and store the output in a third register. One way to describe this would be high-level, functional:
always @(posedge clk) begin
case (state)
s0:
d0 <= a + b;
s1:
d1 <= b + c;
s2:
d2 <= a + c;
endcase
end
Great, that's compact and clear. But what does the datapath of this hardware look like? Is there one adder unit, or three? Who knows? It's a black box, relying entirely on the synthesis software to do the right thing.
A second approach would be to explicitly define a single adder unit:
assign mout = in1 + in2;
always @* begin
// default: maintain same values
d0_next = d0;
d1_next = d1;
d2_next = d2;
case (state)
s0:
begin
in1 = a;
in2 = b;
d0_next = mout;
end
s1:
begin
in1 = b;
in2 = c;
d1_next = mout;
end
s2:
begin
in1 = a;
in2 = c;
d2_next = mout;
end
endcase
end
always @(posedge clk) begin
d0 <= d0_next;
d1 <= d1_next;
d2 <= d2_next;
end
That makes the hardware design clearer, so it's unambiguous that there's only one adder. Is this second approach better than the first, then? Mabye, maybe not. If you're optimizing for space, and don't trust the synthesis software to be as smart as you are, then the second example is probably better. But if you're optimizing for speed, having three separate adders (or at least the possibility of three) may actually be better.
Even this second design is somewhat ambiguous. Presumably there are some muxes at the input to the adder, and a mux or load enable at the input to each D register too. But the Verilog code leaves this all implied and unspecified. Here's a third example that spells everything out in full detail:
wire [1:0] in1Select, in2Select;
assign in1 = (in1Select == 2'b00) ? a :
(in1Select == 2'b01) ? b :
(in1Select == 2'b10) ? c :
d;
assign in2 = (in2Select == 2'b00) ? a :
(in2Select == 2'b01) ? b :
(in2Select == 2'b10) ? c :
d;
assign mout = in1 + in2;
wire loadEnableD0;
wire loadEnableD1;
wire loadEnableD2;
always @* begin
// default: disable all loads
loadEnableD0 = 1'b0;
loadEnableD1 = 1'b0;
loadEnableD2 = 1'b0;
case (state)
s0:
begin
in1Select = 2'b00;
in2Select = 2'b01;
loadEnableD0 = 1'b1;
end
s1:
begin
in1Select = 2'b01;
in2Select = 2'b10;
loadEnableD1 = 1'b1;
end
s2:
begin
in1Select = 2'b00;
in2Select = 2'b10;
loadEnableD1 = 1'b1;
end
endcase
end
always @(posedge clk) begin
if (loadEnableD0)
d0 <= mout;
if (loadEnableD1)
d1 <= mout;
if (loadEnableD2)
d2 <= mout;
end
This approach makes it very clear what's happening in terms of the hardware, and you could build an equivalent physical circuit from 7400 parts. Is this better or worse than the other two approaches? I find it better in terms of understanding what will be synthesized, but it's worse in terms of length. I also suspect that by specifying all the details in this way, it may be over-constraining the synthesis software, preventing it from using some clever optimizations to pack the same amount of logic into less space.
I find myself going around in circles with variations of these three approaches, unable to really get started with the actual CPU design work.
Read 5 comments and join the conversation5 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
Hey Steve! Lots of interesting work and good leads, I like the site. Anyway, question answering thing kind of:
I work in a lab where we build robotic arms, and speaking to a friend in the lab who does a lot FPGA design, he says it’s better to specify everything that you want the synthesizer to do.
In general, the synthesizer/optimizer aren’t as good at cleaning up extra logic as they are at routing it in new and clever ways to fit your circuit on the FPGA. I personally have no idea, but I’m on my way to learning!
As a side note, one of the things to keep in mind is that one of the huge things that people always forget about Verilog is that it *isn’t* a programming language! It shouldn’t be represented “elegantly” or in an easy to read manner, it’s meant to be a way of making sure that a circuit exists as you need it to.
One of the things that I see pointed out all the time on forums on the topic of whether or not to use Verilog or VHDL, the main criticism I see laid against Verilog all the time is that you should be using high-level constructs, even though the language supports it. This is the hardware bit, it doesn’t get much lower level than that!
Interesting. I’m clearly new and inexperienced with this, but I mostly disagree. I had already more-or-less decided to take something like the middle of the three approaches I suggested above, and let the synthesis software do most of the hard work.
The best analogy I have is with software development, which is something I’ve done for decades and know extremely well. In 99% of cases, if you think you’re smarter than the compiler you’re wrong. If you write extra-explicit C++ code that sacrifices readability and understandability in an attempt to force the compiler to generate code a certain way, it’s not good practice.
One thing I could do if I weren’t so lazy would be to synthesize my three Verilog examples, and look at the logic that’s generated. Does the most explicit example result in a faster or smaller hardware implementation than the least explicit? If I had to guess, I’d guess all three examples will produce the same or near-same synthesis result.
Anyway, I’m going to try going forward using a coding style like the first or second examples (most high-level, least explicit). Periodically I’ll look at the RTL model that’s being generated, as well as the macrocell counts, and make sure they look ballpark reasonable. If it looks like part of the design is being synthesized in stupid way, I’ll try rewriting that part of it more explicitly, and see if the synthesis improves. But where two different styles give the same synthesis result, I’ll favor the higher-level one.
The real trick is to keep the architecture of the programmable logic device (CPLD, FPGA, etc.) in mind. For example, in most FPGAs, 4 input LUTs are common so designs which map to functions of no more than 4 inputs will use resources more efficiently. This is sort of like thinking in assembly when coding in C – ie keeping the processor architecture in mind.
I think the approach you described above is great – be explicit to the synthesizer where needed, but be higher-level otherwise.
It’s not a bad idea when you are learning verilog but basically people should not use their time on such topic => just go the simplest way for most of the case.
For doing your design in much lower level, you much have good knowledge of every cost: ie if the design will go to FPGA/ASIC/CPLD etc, which type of it? Does it has special unit inside? what is your priority (power, area, timing etc). Also, you cannot handle complex design, you lose the design’s compatibility: for each new target, you have to redesign all your work; also, your design is hard to maintain/debug
I suggest you use your time to understand the synthesizer optimazation options then do it once for all your design and make your design as simple as possilbe. Only when you cannot meet your target, do such kind of investigation. Generally say, today’s designers donot need to know too much detail implementation related knowledge.