Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Slow performance of Bare metal code on Cortex-A9 ARM hard processor, Cyclone V-SoC

Status
Not open for further replies.

Mechatronics_eng

Newbie level 4
Newbie level 4
Joined
Jul 14, 2010
Messages
6
Helped
1
Reputation
2
Reaction score
1
Trophy points
1,283
Location
Jordan
Activity points
1,339
Based on DE1-SoC kit, I am programming a bare-metal code for dual-core Cortex-A9 ARM hard processor, Cyclone V-SoC.


The configurations of hps output clocks in qsys are:
MPU clock=800 Mhz, L3 MP clk = 200Mhz, L3 SP = 100Mhz, L4 (MP and SP) = 100Mhz.


I am trying only to toggle hps pin, then a code to toggle fpga pin through h2f bridge as follow



Code C++ - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// my code to toggle hps hps_GPIO1[23] pin
int main() {
setup_peripherals();
while (true) {
alt_xorbits_word(hps_addr, 0x800000 );  
}
}
 
 
// my code to toggle fpga AJ22 pin
int main() {
setup_peripherals();
uint32_t *PIO_addr = 0xC0000000 + PIO_0_BASE;
while (true) {
alt_xorbits_word(PIO_addr, 1 ); 
}
}



The code worked but its toggling speed around 500 Khz, as you see it's very slow compared to ARM capabilities


Then I turned on MMU unit and enabled cache system as follow


mmu_init();
alt_cache_system_enable();

here the frequency of toggling increased to 1.3 Mhz.


My Questions are,


How could I increase the toggling speed?
Is this the maximum frequency I can get?
Is there a problem in hps init or configuration?
I tested ARM clocks using alt_clk_freq_get( any_clk, &clk_freq)), I got frequencies as expected, but I am still not sure about this function.
 
Last edited by a moderator:

The macro alt_xorbits_word does a read-modify-write sequence.
Reading is slow.
Writing is faster since the CPU doesn't have to wait for a result.

Try this:

Code:
while (true) {
  alt_write_word(PIO_addr, 0);
  alt_write_word(PIO_addr, 1);
}
 
The macro alt_xorbits_word does a read-modify-write sequence.
Reading is slow.
Writing is faster since the CPU doesn't have to wait for a result.

Try this:

Code:
while (true) {
  alt_write_word(PIO_addr, 0);
  alt_write_word(PIO_addr, 1);
}

Thank you for the replay std_match, it increased the frequency from 1.3 MHz to 5.56 MHz.
I wander if there are any other ideas could increase the frequency further. Because my MPU clock is 800 MHz.
 

If you don't like the performance write this toggle code in assembly.

If you want to know what is taking the function calls so long then look at the assembly code produced. I'd wager it's doing a lot of extra "housekeeping" stuff.
 

If you don't like the performance write this toggle code in assembly.

If you want to know what is taking the function calls so long then look at the assembly code produced. I'd wager it's doing a lot of extra "housekeeping" stuff.

The equivalent assembly code is found to be seven lines only. what makes me expect a performance in the range of 100 MHz. it is still far away.
I am thinking of something might be missing like initialization of clock manager or something like that.
 

I don't think the CPU speed is the problem. My guess is that the bus cycles for accessing the I/O pins are slow for some reason.

You can test the CPU speed by doing a loop without I/O access and execute it a fixed number of times. You can toggle a pin before and after the loop to measure the time. The time spent in the loop shall be much longer than the I/O cycle time, so execute it a million times or so.
Make sure that the dummy loop isn't optimized away by the compiler. One way is to have the loop counter as "volatile".
 
I don't think the CPU speed is the problem. My guess is that the bus cycles for accessing the I/O pins are slow for some reason.

You can test the CPU speed by doing a loop without I/O access and execute it a fixed number of times. You can toggle a pin before and after the loop to measure the time. The time spent in the loop shall be much longer than the I/O cycle time, so execute it a million times or so.
Make sure that the dummy loop isn't optimized away by the compiler. One way is to have the loop counter as "volatile".

After testing the CPU speed by doing a loop without I/O access, it was found indeed that CPU execution was as expected (800 MHz) and the bottle neck is in the access of I/O. is this the limits of the I/O speed?

I wander if there is an idea to trouble shoot and solve the problem.

Thank you very much.
 
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top