Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

[SOLVED] Verilog Error : Too Few Parameters Passed To Task

Status
Not open for further replies.
Last time I checked a FIR filter works just fine in the time domain. ;) If your argument is going to be "but but, something or other related to frequency domain *handwave*", well then bad news. Exactly the same "but but, something or other related to frequency domain *handwave*" argument will apply to your design. And every other signal processing algo for that matter. Your current design is not any more or less "time-domain" than a FIR filter is.

I'd asssume FIR and IIR and such to be part of an EE/telecom curriculum, but apparently not. o_O

Re: BRAM: yes you have to instantiate it. Step 1: read the documentation I mentioned (button in core generator). Step 2: in ISE you select the core, and then do "Show instantiation template" or something to that effect. You can copy/paste the template and fill in your signals.
 

Last time I checked a FIR filter works just fine in the time domain. ;) If your argument is going to be "but but, something or other related to frequency domain *handwave*", well then bad news. Exactly the same "but but, something or other related to frequency domain *handwave*" argument will apply to your design. And every other signal processing algo for that matter. Your current design is not any more or less "time-domain" than a FIR filter is.

I'd asssume FIR and IIR and such to be part of an EE/telecom curriculum, but apparently not. o_O

Re: BRAM: yes you have to instantiate it. Step 1: read the documentation I mentioned (button in core generator). Step 2: in ISE you select the core, and then do "Show instantiation template" or something to that effect. You can copy/paste the template and fill in your signals.

x[n]+x[n-1] >>> Time-Domain
X[Z]+X[Z]Z^(-1) >>> Frequency Domain

These two approaches will give the same results but I am sure they are different.a FIR Filter works in Frequency Domain. It's all about Z^(-1) s! how can you say FIR is in Time-Domain ? unless there's a point i'm missing my dear friend.
 

time- and frequency domain are linked by bijective transformations. In so far they can be considered as different descriptions of the same thing.

E.g. an echo device is also a FIR filter (or IIR filter for echo with feedback), and you can calculate it's z-domain description.

Due to the duality of descriptions, it's your choice which you prefer. Just notice the hint that FIR hardware looks very similar to your "time-domain" signal manipulation. Presently you should focus on learning Verilog for hardware design anyway, I think.

B.t.w.: A great book related to your work is Zoelzer, Digital Audio Effects.
 
how can you say FIR is in Time-Domain ?
I can say so quite easily, because ...
time- and frequency domain are linked by bijective transformations. In so far they can be considered as different descriptions of the same thing.
What he said. ;) You know, Laplace & Fourier transform and all that.
 

I've been reading verilog all day... Guys, I'm feeling helpless, alone at the end of the world:-( I'm only 22 and not being able to proceed with my code I feel like dumb,and assume have wasted my life :-(


Anyways, a man must fight for what he believes and I believe I can do this. Alright. Now I built a block ram( not sure if A FIFO memory would be better or not). also tried some FIR 5.2 version and ooppss, more than 6000 lines of alien codes :shock:

So I tried to put that block ram into circularbuffer and get rid of those nasty creepy fors and do the hardware way. now I've got to declare write and read pointers for my memory, right ? to mention again, I chose a true dual port memory. because I needed 2 read pointers ...

Now I have to get rid of that lot of modules,make them less, two might be proper. one for block ram, one for the effect... and then connect them in a way that a top module can work fine in the testbench with meaningful outputs... Right ?
 

I've been reading verilog all day... Guys, I'm feeling helpless, alone at the end of the world:-( I'm only 22 and not being able to proceed with my code I feel like dumb,and assume have wasted my life :-(
It's more an issue of having a software mindset and not thinking of parallel hardware. You've been looking at Verilog as if it executes in a sequential fashion, i.e. like loops iterate N times in N cycles...so in the case of your code you thought that the loop would execute one index each clock cycle. Nope it's translated to hardware...so all indexes are implemented in parallel. For loops are usually reserved for use to make N copies of something, e.g. you need 10 SPI interfaces, so you can a) instantiate each one separately, b) use a for loop to generate the 10 instances.

Anyways, a man must fight for what he believes and I believe I can do this. Alright. Now I built a block ram( not sure if A FIFO memory would be better or not). also tried some FIR 5.2 version and ooppss, more than 6000 lines of alien codes :shock:

So I tried to put that block ram into circularbuffer and get rid of those nasty creepy fors and do the hardware way. now I've got to declare write and read pointers for my memory, right ? to mention again, I chose a true dual port memory. because I needed 2 read pointers ...
I've never suggested using a FIR to do this. That's been FvM and mrfibble suggestion, I was wary of suggesting anything like that as you may not know exactly how to use a FIR to implement this. Even more likely not know how to translate all those software FIR examples into a Verilog hardware FIR filter design. If you do know how to do this then it's a better implementation. Otherwise your original algorithm will work you just needed to get out of using a software loop approach. You need some control logic (FSM or similar) to time the reads and writes to the RAM and an address generator to produce the correct addresses. The actual calculation would be used, but pipelined with the RAM interface.

Now I have to get rid of that lot of modules,make them less, two might be proper. one for block ram, one for the effect... and then connect them in a way that a top module can work fine in the testbench with meaningful outputs... Right ?
You don't "have to" reduce the number of modules, but if you do it will make your life easier to "see the big picture" (can't see the forest for the trees). I find it's much harder to see the flow of logic in a design that is broken down into a excessively fine granularity. This is similar to the problem of having a design that isn't broken down enough (so you get lost in the "forest" of code).

Regards
 

Alright. Now I built a block ram( not sure if A FIFO memory would be better or not). also tried some FIR 5.2 version and ooppss, more than 6000 lines of alien codes :shock:
A fifo might be okay as well. This will depend on your memory access pattern, which you will know more about than me. If you do your read/writes as first in first out, then yes FIFO will do the trick. Such a choice usually is done at the design stage, so that's an idea for your next design. ;) As for 6000 lines of alien code ... the idea is usually to read the pdf documentation of that code, so you don't have to go over those large amounts of code. ;) Typically you only need to dive into core generated code if something went horribly wrong.

So I tried to put that block ram into circularbuffer and get rid of those nasty creepy fors and do the hardware way. now I've got to declare write and read pointers for my memory, right ? to mention again, I chose a true dual port memory. because I needed 2 read pointers ...
Well, if you truly need 2 read pointers for whatever reason, then you will not be able to do it with a plain fifo. So IF 2 read pointers truly required, THEN forget about fifo's. This of course depends rather heavily on if you really really require 2 read pointers. You do 2 seperate reads at distinctly different locations every clock cycle?

Now I have to get rid of that lot of modules,make them less, two might be proper. one for block ram, one for the effect... and then connect them in a way that a top module can work fine in the testbench with meaningful outputs... Right ?
For testbench purposes that sounds about right.
 

Fortunately audio sample rate is only a small fraction of the achievable system clock frequency, so you can easyly time-multiplex many read operations on the circular buffer. No dual port memory actually needed.
 

Well, now you just messed up my perfectly loaded question. ;) The idea was to find out the motivation for the multiple reads.
 

Well, if you truly need 2 read pointers for whatever reason, then you will not be able to do it with a plain fifo. So IF 2 read pointers truly required, THEN forget about fifo's. This of course depends rather heavily on if you really really require 2 read pointers. You do 2 seperate reads at distinctly different locations every clock cycle?
The reads from the RAM are based on the G tables that were in the rar file. I'm looking at the addressing and the first table produces addresses that are incrementing/decrementing by 4. Don't know if there is any point where the table increments differently. The next two G tables of indices changes by +/-2. So all three tables look to be simple saw tooth waveforms, which is simple enough to implement in a always block as an increment/decrement by N counter to generate the addressing to the RAM.

It's rather hard to determine what the actual intention of your algorithm is based on the original code as it wasn't representative of software or hardware. It might be helpful if you posted the software program (that was tested as working) that you based your Verilog code on as that would allow us to understand the intent.

Things like this:
Code:
Buff_Array1=Buff_Array>>1;
Audio_out=Buff_Array[0]+ divide(Buff_Array[t],Buff_Array1[t]);
You appear to be performing a divide by 2 on each element of the Buff_Array and adding it to the BUff_Array[0] value? I'm not sure if your Buff_Array values are all unsigned, I would probably assume so as the values range up to 2400, but then at a different point in your code you perform a subtraction:
Code:
Audio_out=Buff_Array[0]- divide(Buff_Array[G],Buff_Array1[G]);
Is Buff_Array[0] always bigger than the divide by 2 values for every other Buff_Array entry?

Regards
 

A fifo might be okay as well. This will depend on your memory access pattern, which you will know more about than me. If you do your read/writes as first in first out, then yes FIFO will do the trick. Such a choice usually is done at the design stage, so that's an idea for your next design. ;) As for 6000 lines of alien code ... the idea is usually to read the pdf documentation of that code, so you don't have to go over those large amounts of code. ;) Typically you only need to dive into core generated code if something went horribly wrong.


Well, if you truly need 2 read pointers for whatever reason, then you will not be able to do it with a plain fifo. So IF 2 read pointers truly required, THEN forget about fifo's. This of course depends rather heavily on if you really really require 2 read pointers. You do 2 seperate reads at distinctly different locations every clock cycle?


For testbench purposes that sounds about right.

The base functionality of all these processes is in adding the current sample with a delayed sample. so the circular buffer and now the block ram with address generator is for this. to have two samples ready at a time to add or subtract.

Fortunately audio sample rate is only a small fraction of the achievable system clock frequency, so you can easyly time-multiplex many read operations on the circular buffer. No dual port memory actually needed.

Well given a sample rate of 40 kHz at least...and needing let's say 1 second delay... I will need 40000 blocks in my ram. that's impossible to only time-multiplex.

- - - Updated - - -

The reads from the RAM are based on the G tables that were in the rar file. I'm looking at the addressing and the first table produces addresses that are incrementing/decrementing by 4. Don't know if there is any point where the table increments differently. The next two G tables of indices changes by +/-2. So all three tables look to be simple saw tooth waveforms, which is simple enough to implement in a always block as an increment/decrement by N counter to generate the addressing to the RAM.

It's rather hard to determine what the actual intention of your algorithm is based on the original code as it wasn't representative of software or hardware. It might be helpful if you posted the software program (that was tested as working) that you based your Verilog code on as that would allow us to understand the intent.

Things like this:
Code:
Buff_Array1=Buff_Array>>1;
Audio_out=Buff_Array[0]+ divide(Buff_Array[t],Buff_Array1[t]);
You appear to be performing a divide by 2 on each element of the Buff_Array and adding it to the BUff_Array[0] value? I'm not sure if your Buff_Array values are all unsigned, I would probably assume so as the values range up to 2400, but then at a different point in your code you perform a subtraction:
Code:
Audio_out=Buff_Array[0]- divide(Buff_Array[G],Buff_Array1[G]);
Is Buff_Array[0] always bigger than the divide by 2 values for every other Buff_Array entry?

Regards

Those saw-tooths are meant to be sinusoids...But I didn't bother about that,saw-tooth doesn't make much difference here and the more important issue was implementing the whole design.

each of those set of Gs and finally the Mixer (Audio_out=Buf....) has a meaning in audio processing.
Buff_array[0] is a specific block(constant read point) but its content changes in every clock, so do the other blocks. but for a standard delay,echo,chorus... the delayed sample must be multiplied in a less than 1 constant so that human ear can understand the effect and also the probability of samples omitting each other becomes low.
 

Well given a sample rate of 40 kHz at least...and needing let's say 1 second delay... I will need 40000 blocks in my ram. that's impossible to only time-multiplex.
The problem is that you stiil don't understand the concept of a circular buffer. You have one write cycle and and several read cycles (one for every tapped signal with different delay) per audio sample clock. Surely not 40000.
 

The current sample is current. So why would you need to read it out of block ram? Since it is current you should keep it in a register as well. Why? Because that way you don't need a memory read for it.

As for this:
Well given a sample rate of 40 kHz at least...and needing let's say 1 second delay... I will need 40000 blocks in my ram. that's impossible to only time-multiplex.
Noooooope. What FvM says is quite possible.

Example A: memory clock of 40 kHz , dual read ports
Example B: memory clock of 80 kHz, single read port

Both A and B can get those two required reads done within a 25 microsecond period.
 

The problem is that you stiil don't understand the concept of a circular buffer. You have one write cycle and and several read cycles (one for every tapped signal with different delay) per audio sample clock. Surely not 40000.


Read my code for circular buffer to see if I understand it or not.I'm not saying I need 40000 read points,I'm saying I need a memory anyway. maybe it's just a misunderstanding. you said no dual port memory is needed. But you actually meant it doesn't have to be dual port, a single port suffices,Right ?



The current sample is current. So why would you need to read it out of block ram? Since it is current you should keep it in a register as well. Why? Because that way you don't need a memory read for it.

As for this:

Noooooope. What FvM says is quite possible.

Example A: memory clock of 40 kHz , dual read ports
Example B: memory clock of 80 kHz, single read port

Both A and B can get those two required reads done within a 25 microsecond period.


Well,you mean I take out the current sample and after a very fast clock pulse, take out the delayed sample out too and add them to each other, Right ?
 

Could you list the formula your design has for the output in terms of delayed inputs? I imagine it's the weighted sum of a bunch of taps.
The number (and position) of taps should dictate how your reads are done.
 

I guess you are right. Read the attached pdf. It's very short and concise.
 

Attachments

  • FPGA-effects.pdf
    28.6 KB · Views: 120
  • Like
Reactions: FvM

    FvM

    Points: 2
    Helpful Answer Positive Rating
So just one read port required. Don't quite see why this is an fpga application, you could almost do this on a battery powered msp430. Maintain circular buffer and do all of 2 multiply adds per sample. Well okay, maybe an stm32 with some more memory for longer delays.

Anyways, that document seems to have enough hints I'd say.
 

There's something here I don't get, about the bram

a single port bram has these signals as pinout:

CLKA:port A operations are synchronous to this clock.
ADDRA:adresses the memory space for port A Read and Write
operations.
DINA : Data input to be written into the memory via port A.
DOUTA : Data output from Read operations via port A.
WEA :Enables Write operations via port A.

1. I am sure the write process must be done exactly at the sample rate of the A/D which is 40 kHz or higher.
But as said before, the clock for reading must be higher. how is that possible ?
2. I can't understand the functionality of this ram clearly. suppose I want it to write a hex FFFF into the block with an address of FAFB! and at the same time I want it to utter out the content of block with an address of A80F! how can I do that ? is that done with WEA ? When 1 the the RAM writes DINA into the address, when zero RAM reads DOUTA from the address ? can it be done simultaneously ? I think the mismatch of clocks will result in errors.
 

Either use a clock that is 2x the sample frequency and perform a write-read cycle for every two clocks. Or use a simple dual port BRAM. One port is write only the second port is read only. This style of BRAM will have separate addresses for write and reads.

1. I am sure the write process must be done exactly at the sample rate of the A/D which is 40 kHz or higher.
But as said before, the clock for reading must be higher. how is that possible ?
I suspect you don't even have a planned method of getting data into the part do you? The A/D should be controlled by the FPGA in the best case scenario. The FPGA logic would be running at a minimum of 2x the sample frequency. I would strongly recommend using a clock that is at an even higher frequency. Beside reducing the latency through the design, you will also find that the oscillators used to run a clock at frequencies in the low MHz range tend to be very inexpensive.

2. I can't understand the functionality of this ram clearly. suppose I want it to write a hex FFFF into the block with an address of FAFB! and at the same time I want it to utter out the content of block with an address of A80F! how can I do that ? is that done with WEA ? When 1 the the RAM writes DINA into the address, when zero RAM reads DOUTA from the address ? can it be done simultaneously ? I think the mismatch of clocks will result in errors.
The documentation should show timing diagrams for writing and reading from the RAM.

Is this by chance the first digital design class you've taken? A lot of what you seem to be having trouble with are pretty basic digital design techniques.

Regards
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top