A better coding style in Verilog

Status
Not open for further replies.

pklink

Newbie level 4
Joined
Jan 10, 2014
Messages
7
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Visit site
Activity points
90
Hi everyone,

I'm currently working on a project for my studies in which I have to implement an elevator control. The modules and it's connections between each other are given and we only have to implement the functionality of the modules.

In the lectures, we were told that we should use non-blocking assignments (for example: a <= 3'b0 within always blocks if we want to write code for a FSM. While I was writing the first two modules (they are used for the motor steering), I recognized that I would not have any clue how to implement the modules without using blocking assignments in the always block while the Module is still a FSM. So I wanted to ask whether this rule to use non-blocking assignments for FSMs is just not completely true or if I should alter my module. Beneath is the code for one module (just to clarify: I'm not asking to verify the code or anything else The code should just give you a little overview about the model):

Code:
[syntax=verilog]
`timescale 1ns / 1ns

module motor_if

         #(parameter INITIAL_POSITION  =       0,      // inital position is at floor 0
                     MOTOR_STEP        =      50,      // [m] (a single motor step is 50m = 0,05mm)
                     CONTROL_STEP      =      10,      // [mm] (a control step is 10mm = 1cm)
                     DISTANCE_BITS     =      14,      // max distance 12m = 12.000 mm
                     CONTROL_STEP_BITS =      11,      // 12m <> 1.200 control steps
                     MOTOR_STEP_BITS   =      18,      // max distance 12m = 240.000 * 50m
                     MOTOR_STEP_BITS2  =      18,      // max distance 12m = 240.000 * 50m
                     MAX_POSITION_R    =    1200,      // [control steps] (1.200 control steps = 240.000 motor steps = 12 m)
                     MAX_POSITION_L    =       0,      // [control steps] (0 control steps = 0 motor steps = 0 m)
                     DELAY_COUNT_BITS  =      17,      // 17 bits required to count up to C_0 = 100.000
                     C_0               =  100000,      // initial counter value <> 100 Hz @ 10 MHz clock
                     C_MIN             =     250)      // minimum counter value <> 40 kHz @ 10 MHz clock

         (input  wire                      CLK,
          input  wire                      RESET,

          input  wire [DISTANCE_BITS-1 :0] DISTANCE,        // levels [mm] / door_width [mm]
          input  wire                      ROTATE_LEFT,     // open or down
          input  wire                      ROTATE_RIGHT,    // close or up
                                            
          output reg                       CTRL_STEP_DONE,  // 10mm distance is a step
          output reg                       DONE,            // motor request executed
          output wire                      A,               // motor coil a output
          output wire                      B,               // motor coil a output
          output wire                      C,               // motor coil a output
          output wire                      D);              // motor coil a output

/* =============================INSERT CODE HERE======================================*/ 

	//Test related regs and wires
	reg [DELAY_COUNT_BITS -1:0]	actual_delay_rounded;

	//The number of motor steps which make one control step
	parameter steps_per_ctrl = CONTROL_STEP * 1000 / MOTOR_STEP;
	//Register to save motor steps needed for the current command
	reg [DISTANCE_BITS + 9:0]				motor_steps_command;
	//Register to save motor steps needed for the acceleration and deceleration
	reg [DISTANCE_BITS + 8:0]				motor_steps_accel, motor_steps_decel;
	/*
	* Register for storing the current number of clocks to wait before making a motor step.
	* The value is stored in a DELAY_COUNT_BITS.16 fixed point format
	*/
	reg [(DELAY_COUNT_BITS + 16) - 1:0]	actual_delay; 
	/*
	* Register for storing the calculation result of the next delay. The calculation result
	* is 2 * (DELAY_COUNT_BITS + 16) bit wide, that's why the result can't be directly
	* assigned to actual_delay
	*/
	reg [(DELAY_COUNT_BITS + 16) + (16 + 16) - 1:0] actual_delay_tmp;
	/*
	* Register for storing the number of clock waited since the last calculation of 
	* actual_delay
	*/
	reg [DELAY_COUNT_BITS - 1:0] 	delay_count;	
	/*
	* Registers storing the direction (LEFT, RIGHT) of a command, the emergency halt signal.
	* The remaining registers are helpers for converting wires to registers
	*/
	reg 									direction, halt, next_root, previous_root, a_tmp, b_tmp, c_tmp, d_tmp;
	//Wires for controlling the sqrt module (instantiated below)
	wire									next_root_tmp, previous_root_tmp;
	//Wire for the done signal of the sqrt module
	wire									root_finished;
	//Wire for the result of the sqrt module
	wire [31:0]							root_res;
	/*
	* Register for storing the current root before calculating the next root, since both roots
	* are needed for the calculation of the next delay
	*/
	reg [31:0]							interval_root;
	/*
	* Registers for storing the motor steps done in one command and the total
	* motor steps done since the reset (the total motor steps consider the direction of a
	* command). The total motor steps are used to actually steer the motor (using outputs
	* A, B, C and D)
	*/
	reg [MOTOR_STEP_BITS - 1:0]	total_motor_steps, current_motor_steps;
	/*
	* Register for storing the position of the elevator or the door (is used to check whether
	* a command exceeds the maximum or minimum position of the elevator or door)
	*/
	reg [CONTROL_STEP_BITS - 1:0]	position;
	//Register for storing the intern state of the module 
	reg [1:0]							state;
	
	//Conversion between wires and regs
	assign next_root_tmp = next_root;
	assign previous_root_tmp = previous_root;
	assign A = a_tmp;
	assign B = b_tmp;
	assign C = c_tmp;
	assign D = d_tmp;
	
	//Instantiation of the sqrt module used to calculates the square roots
	sqrt root(CLK, RESET, next_root_tmp, previous_root_tmp, root_finished, root_res);
		
	//State and combinational logic
	always @ (posedge CLK) begin
		if(CTRL_STEP_DONE) CTRL_STEP_DONE = 0;
		if(next_root) next_root = 0;
		if(previous_root) previous_root = 0;
		//Reset the whole module
		if(RESET) begin
			CTRL_STEP_DONE = 0;
			DONE = 1;
			
			motor_steps_command = 0;
			motor_steps_accel = 0;
			motor_steps_decel = 0;
			
			delay_count = 0;
			actual_delay = {C_0, 16'b0};
			
			direction = 0;
			halt = 0;
			next_root = 0;
			previous_root = 0;
			
			interval_root = root_res;
			
			total_motor_steps = INITIAL_POSITION * steps_per_ctrl;
			current_motor_steps = 0;
			
			position = INITIAL_POSITION;
			
			state = 0;
		end
		//Emergency halt the motor
		else if(ROTATE_LEFT & ROTATE_RIGHT) begin
		halt = 1;
		end
		else begin
			//Check whether the module is currently within a command (!DONE) or not (DONE)
			if(DONE) begin
				//If new commando is given, take the given values and start
				if(ROTATE_LEFT | ROTATE_RIGHT) begin
					if(ROTATE_LEFT ? position - DISTANCE / CONTROL_STEP >= MAX_POSITION_L : position + DISTANCE / CONTROL_STEP <= MAX_POSITION_R) begin
						delay_count = 0;
						actual_delay = {C_0, 16'b0};
						
						direction = ROTATE_LEFT ? 0 : 1;
						
						current_motor_steps = 0;
						
						motor_steps_command = (DISTANCE * 1000) / MOTOR_STEP;
						if(motor_steps_command[0]) begin
							motor_steps_accel = (motor_steps_command - 1) / 2;
						end
						else begin
							motor_steps_accel = motor_steps_command / 2;
						end
						
						state = 0;
						
						DONE = 0;
					end
				end
				//If no new commando is given, do nothing
			end
			else begin
				/*
				* Check if distance has to be updated (can only be updated along the current 
				* direction). It's not a problem that this block is not only run through during
				* updates, since the distance cannot change is a single clock? Maybe Issue
				*/
				if((ROTATE_LEFT & !direction) | (ROTATE_RIGHT & direction)) begin
					motor_steps_command = (DISTANCE * 1000) / MOTOR_STEP;
					if(motor_steps_command[0]) begin
						motor_steps_accel = (motor_steps_command - 1) / 2;
					end
					else begin
						motor_steps_accel = motor_steps_command / 2;
					end
				end
				//Do calculations (only if motor is not halted)
				if(!halt) begin
					
					//Check in which state the module is currently
					case (state)
						/*
						* State 0: Check if target distance if greater than 0, if yes, change to
						* 			  state 1 and pre-calculate next root
						*/
						0: begin 
								if(motor_steps_accel > 0) begin
									state = 1;
									/*interval_root = root_res;
									next_root = 1;*/
								end
								else begin
									state = 3;
								end
							end
						/*
						* State 1: Accelerate motor -> check if enough clocks have been waited,
						*			  if yes, calculate new interval, store current root, calculate
						* 			  next root, do a motor step and update the total motor steps, if
						*			  not, the increase delay count
						*			  If the half of target distance is travelled, change to state 3
						*			  and calculate the previous root
						*			  If the actual delay is smaller the C_MIN change to state 2 and
						*			  calculate the previous root
						*/
						1: begin 
								if(delay_count == actual_delay[DELAY_COUNT_BITS + 15:16]) begin
									delay_count = 1;
									//Do motor step (current and total)
									current_motor_steps = current_motor_steps + 1;
									total_motor_steps = direction ? total_motor_steps + 1 : total_motor_steps - 1;
									if(actual_delay <= C_MIN) begin
										motor_steps_decel = current_motor_steps;
										state = 2;
									end
									else if(current_motor_steps == motor_steps_accel) begin
										if(!motor_steps_command[0]) begin
											interval_root = root_res;
											previous_root = 1;
											state = 3;
										end
									end
									else if(current_motor_steps == motor_steps_accel + 1) begin
										interval_root = root_res;
										previous_root = 1;
										state = 3;	
									end
									else begin
										interval_root = root_res;
										next_root = 1;
									end
								end
								else begin
									delay_count = delay_count + 1;
								end
								
							end
						/*
						* State 2: Travel with full speed -> check if enough clocks have been
						*			  waited, if yes, do a motor step and update the total motor
						*			  steps
						*			  If the acceleration distance is left of target distance,
						*			  calculate the new delay and the previous root
						*/
						2: begin						
								if(delay_count == C_MIN) begin
									delay_count = 1;
									//Do motor step (current and total)
									current_motor_steps = current_motor_steps + 1;
									total_motor_steps = direction ? total_motor_steps + 1 : total_motor_steps - 1;
								end
															
								if(motor_steps_command - current_motor_steps == motor_steps_decel) begin
									interval_root = root_res;
									previous_root = 1;
									state = 3;
								end
							end
						/*
						* State 3: Decelerate motor -> like if state 1, but calculating the
						*			  previous root instead of the next root
						*			  If the total distance was travelled, change to state 0 and set
						*			  DONE to 1
						*/
						3: begin 								
								if(delay_count == actual_delay[DELAY_COUNT_BITS + 15:16]) begin
									delay_count = 1;
									interval_root = root_res;
									//Do motor step (current and total)
									current_motor_steps = current_motor_steps + 1;
									total_motor_steps = direction ? total_motor_steps + 1 : total_motor_steps - 1;
									if(current_motor_steps == motor_steps_command) begin
										$stop;
										state = 0;
										if(root_res == 0) begin
											next_root = 1;
										end
										DONE = 1;
									end
									else begin
										previous_root = 1;
									end
								end
								else begin
									delay_count = delay_count + 1;
								end								
							end
					endcase
				end
			end
		end
	end
	
	//Increase current_distance and position (total_distance) according to motor_steps
	always @ (current_motor_steps) begin
		if(state != 0 & current_motor_steps > 0 & current_motor_steps % steps_per_ctrl == 0) begin
			position = direction ? position + 1 : position - 1;
			CTRL_STEP_DONE = 1;
		end	
	end
	
	/*
	* Take the relevant bits (DELAY_COUNT_BITS integer bits, 16 fractional bits) from the
	* calculation of the next delay stored in actual_delay_tmp
	*/
	always @ (*) begin
		actual_delay = actual_delay_tmp[(DELAY_COUNT_BITS + 15) + 16:16];
	end
	
	/*
	* Calculate actual_delay
	*/
	always @ (*) begin
		if(current_motor_steps > 0 & root_finished) begin
			if(state == 1) begin
				actual_delay_tmp = {C_0[DELAY_COUNT_BITS - 1:0], 16'b0} * (root_res - interval_root);
			end
			else if(state == 3) begin
				actual_delay_tmp = {C_0[DELAY_COUNT_BITS - 1:0], 16'b0} * (interval_root - root_res);
			end
		end
	end
	
	//Test related mappings
	always @ (*) begin
		actual_delay_rounded = actual_delay[(DELAY_COUNT_BITS +15):16];
	end
		
	//Mapping between motor_steps and the motor signals
	always @ (*) begin
		case (total_motor_steps % 4)
			0: {a_tmp, b_tmp, c_tmp, d_tmp} <= 4'b1001; 
			1:	{a_tmp, b_tmp, c_tmp, d_tmp} <= 4'b1100;
			2: {a_tmp, b_tmp, c_tmp, d_tmp} <= 4'b0110;
			3: {a_tmp, b_tmp, c_tmp, d_tmp} <= 4'b0011;
		endcase
	end
	
/* ====================================================================================*/
   
endmodule 
[/syntax]

Thanks,
Pascal
 

Im guessing that you're a software guy, given you want to use blocking assignments? blocking assignments work variable assignments in C. There is no equivolent of a non-blocking assignment.

The problem with using blocking assignments is they are liable to create all sorts of async logic. non-blocking assingments wont get updated until the always block finishs - like a register. So blocking will give you async logic or registers (depending on where you place them in your code) while non-blocking will always give you registers in a clocked always block, no matter where you put them.

You really need to start thinking in terms of logic, not the code. I suggest trying to draw the circuit out on paper before you write the code, then you have an idea what you're actually trying to achieve.
 
Reactions: pklink

    pklink

    Points: 2
    Helpful Answer Positive Rating
Hi TrickyDicky,

thanks for the reply. You're right I've mostly worked with Java so far. So would it be basically better to put the state logic (which is quite simple in this module) in one posedge triggered always block with non-blocking assignments and all the calculations in always (*) or always(variable_name) blocks with blocking assignments?

About the problem with creating async logic with blocking assignments: Can I be sure that if the simulation (with in my case ISim from ISE) runs correctly (with the target FPGA in the project settings of course), the synthesized code on the FPGA will also behave correctly.

Thanks,
Pascal
 

In a synchronous always block, blocking should behave the same.

For asynchronous, you need to ensure you have the correct signals in the sensitivity list (or just use (*) ).

You shouldnt separate your state machine into multiple always blocks. You should be able to contain it in a single clocked always block with 0 blocking assignments. You just need to think more about how the logic works. If you're having to use multiple blocked assignments, you're probably not doing it quite right.
 

So when you say "In a synchronous always block, blocking should behave the same." you mean that I don't have a "performance" advantage by using blocking assignments?
Because I thought that by using blocking assignments the synthesized hardware will do the assignments and checks (if-clauses in the code) in the order I wrote them. A short example might clarify my thoughts on this:

I thought that this code (not a perfect example I know because this is basically combinatory logic, but I wanted to keep it short)

Code:
[syntax=verilog]

reg a;

always @ (posedge CLK) begin
    a = input;
    if(input) output = 1;
end
[/syntax]

would set the output directly to high when input is high on a posedge of CLK while

Code:
[syntax=verilog]

reg a;

always @ (posedge CLK) begin
    a <= input;
    if(input) output <= 1;
end
[/syntax]

would need one clock to set output to high after input is high at a posedge of CLK, because in one CLK the value of input is assigned but the if clause in this clock still uses the old value of register a (which would be 0) and then in the next clock would recognize that register a is high.

Am I wrong with my thoughts?

THanks,
Pascal
 

pklink,

Tricky is mistaken when he states the following:
In a synchronous always block, blocking should behave the same.

In a always block the behavior of a block assignment and a non-blocking assignment are subtly different.

Blocking assignments are done immediately
non-block assignments are done after all of the right-hand sides have been evaluated.

Here is a link with some examples of what this means (the first 5 slides have a good explanation).


In the case of your example the result is the same as none of the R.H.S. evaluations depends on a previously evaluated variable.

A simple rule for always blocks is to use non-blocking for synchronous circuits (registers) and blocking for combinational circuits.


Regards
 
Reactions: pklink

    pklink

    Points: 2
    Helpful Answer Positive Rating
pklink,

Tricky is mistaken when he states the following:


In a always block the behavior of a block assignment and a non-blocking assignment are subtly different.

Sorry, I must have not had my hands and brain connected. I meant to say:

In a synchronous always block, blocking should behave the same in simulation as the final circuit, but you could end up building a heavily complex design that finishes in a single clock cycle (which would not be very useful)
 
Reactions: pklink

    pklink

    Points: 2
    Helpful Answer Positive Rating
Thanks for the replys,

the link was very helpful allthough one little question is still open to me. Again a part of a module of the elevator simulation : In this case a module which calculates (starting from 1) the next or previous root. So for example after a reset you have sqrt(1) on the result output and by setting the next signal to one the module calculates root(2) within 16 cycles with bisection and gives the result (in 16.16 fixed point format):

Code:
[syntax=verilog]
/*
* Here does the actual calculation take place, consequently this is only done if
* DONE is set to 0
*/
if(!DONE) begin
	/*
	* Check if the maximum precision of the 16.16 format is reached - if this is the
	* stop the calculation and set DONE to 1
	*/			
	if (delta > 0) begin
		//Adjust the estimation of the current square root
		SQRT = (high + low) >> 1;
		//Check if the estimation is the new high or low boundary
		sqr = SQRT * SQRT;
		if(sqr[48:16] > {1'b0, n[15:0], 16'b0000000000000000})
			high = SQRT;
		else
			low = SQRT;
		//Adjust the precision of the estimation
		delta = delta >> 1;
	end
	else begin
		DONE = 1;
	end
end	
[/syntax]

The interesting part to me is the calculation in the if block. The Problem here is that in this case I think I really need blocking assignments because I want to calculate the next digit of the result in one cycle. Consequently I first have to calculate SQRT, then calculate sqr and then assign SQRT to high or low according to the value of sqr (I am sure that a good verilog programmer would probably not need sqr because it's just a helper). "delta" on the other hand is kind of the real state of the machine and it would not matter to me if I use a non-blocking assignment to assign the new value to delta. However, I am afraid that mixing of blocking and non-blocking assignments is even worse than using blocking assignments within sequential logic. Is that correct or can I use a combination of blocking and non-blocking assignments?

Thanks,
Pascal
 

HI

I think you are getting wrapped up in the non-blocking/blocking business and not seeing the bigger issues you have in the code you have presented.

Fortunately, the blocking/non-blocking thing is easy to clear up, and has already been presented to you.

If you use a sequential block (i.e. always @(posedge clk)) you must use non-blocking (<=) assignments for synthesizeable logic. Period. End of Story.

If you use a combinatorial block (i.e. always @(*)) you must use blocking (=) assignments for synthesizeable logic. Period. End of Story.

Your bigger problem, unless I misread your question, is that you are still trying to think like a software person rather than a hardware designer. Remember, you are describing hardware here. I know you have confessed to mostly dealing with software, so i will point out some of the things you must consider in your example.


This is a section of code where, I believe, you would like everything to occur in one clock cycle so that you can use sqr immediately. Often software designers fail to realize that if this were converted to C as is, it still might not execute in one clock cycle. Probably not. Having just spent months working on and code profiling cellular radio antenna algorithms that will run on a hard processor (Cortex A7) on an ASIC, a floating point multiply will take at least 4 CPU cycles, and that is with an optimized hard floating point engine and an 8-stage pipeline and a very smart compiler. A real floating point square root takes 28 CPU cycles. An integer multiply will still take a long time because even though you lose the floating point complexity, you usually also lose the hard math processor.

But, as a part-time software guy myself, it is easy to fool yourself into thinking that when you write:

Code:
           a = b *c; 
           d= a*e;

you think, ah, yes, a is immediately available to me. No it's not. It's available after several clocks.

Now, you are a hardware designer with a blank FPGA. You have no compiler that is going to manage a pipeline for you. You have no pipeline to manage. You may have no multiplier! Now is when you pull your big-boy hardware pants on and look at what hardware you are describing.

First, in your example, if you have made SQRT a flip-flop, then the result will not be available until the next clock period. Such is sequential logic.

Secondly, you must understand what the synthesizer will do to your multiply sign. Does your FPGA of choice have a hardware multiplier? If it does, what do the datasheets say about its latency. Can it produce a result in the same clock cycle? The next? 4 cycles later? 20? If there is no multiplier, what multiplier IP is available to you? Does the FPGA vendor provide IP or will you write a multiplier block?

The point is, you need to understand the pieces you have to use, and how fast they can do their jobs. If you can't get the multiply done in enough time, you have to start thinking about pipelining your design, where you break up your calculations into little chunks that will fit in a clock period and then passing the result to the next stage of the pipeline. Just like a car assembly line. A car plant may have a completed car roll off of the line every 10 minutes, but it took much longer than 10 minutes to build each car.

So, look at this design at higher level first. Decide what features you need and what hardware will be needed to create that feature. Then describe that hardware with Verilog. I know this is hard to do at first but ultimately you design the hardware and then describe it. Describing it without having an idea what you need it is a time waster when the designs get bigger.

In your example above, lets say you decide that you definitely need that multiplier, and after investigating all the ways you could implement the multiplication, you find you simply can't get it to give you a result in less than 4 clock cycles. Now you can write a state machine that takes into account the length of time that it takes to get the SQRT result. It will be a very different FSM than the one you just wrote. Or you may sit down and think of ways to take advantage of the parallelism that FPGA's offer, or create a pipeline, etc. This is an example of understanding the hardware requirements and then describing them accordingly to get the features and performance you want. Or finding you can't do what you want, in the time you want, at all.

This has been a bit long-winded but I wanted to point out some things you should be considering.

r.b.

EDIT: spelling
 
Reactions: pklink

    pklink

    Points: 2
    Helpful Answer Positive Rating
Hi rberek,

your post was extremely helpful! I just was still thinking like I would programm a software. Ok so that barrier is taken .

Allthough I think (or should I say hope ) multiplication should work within one cycle (I reread some slides of the lecture and googled a bit) allthough it limits the maximum frequency (to what I've read round about 50 Mhz with a somewhat actual FPGA, which is more than enough for me (target speed is 10 MHz for the module). But the hard part is the division - allthough I found some interesting methods to use bitshifts instead of dividing with a constant divisor, I need to divide a value represented by n-bits (n is a parameter) through another parameter. I know this does not correspond to the initial question but do you have any ideas how to avoid this division or is this just not possible?

The reason I'm asking is that the code has to be synthesizable, which is not the case if I try to divide with a value different from 2.

Thanks,
Pascal
 
Last edited:

If I were you, I would look at implementing a divider core provided by the FPGA vendor. I know Xilinx provides one. You can assume Altera will as well.


r.b.
 

There are some ways to implement a divider. I will show you two:

The most intuitive is the normal division we learn on school. For example, if you want to divide 255 by 5, you have (0x7D / 0x5). Then you divide 0x7/0x5 = 0x1, rest 0x2. So you apply this to the second digit: 0x2D /0x05 = 0x9, so you have that 0x7D /0x05 = 0x19. This way, you must have a table that covers all single-digits division (in the example, 7/5 and 2D/5)

Other way is using aproximation. On the same example, 0x7D/0x05, you know the the divider 0x05 is a number between 4 and 8. And you have the way to divide this numbers (unsing rotate). So, basically, you do:

1) 0x7D << 2 = 0x1F
2) 0x7D << 3 = 0x0F

So, you know the result is a number between 1F and 0F. Now, you get the middle point, make a multiplication by 0x05 and check if the number is bigger or lesser than 0x7D;

3) Initial value ((0x1F + 0x0F)/2) = 0x17. 0x17 * 0x05 = 0x73 ( < 0x7D). Once the number is lower, our result is between 0x17 and 0x1F.
4) (0x17 + 0x1F)/2 = 0x1B. 0x1B * 0x05 = 0x87 (> 0x7D). Number between 0x17 and 0x1B.
5) (0x1B + 0x17)/2 = 0x19. 0x19 * 0x05 = 0x7D (=0x7D), we found it.

There are also lots of specific methods of avoiding division, but they refers to specifics algorithms, and not to an arbitrary m divided by n operation.
 

Best way to avoid division is multiplication.

If you are trying to divide by a constant, N, you can just multiply by 1/N instead.
 
Reactions: zel

    zel

    Points: 2
    Helpful Answer Positive Rating
Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…