Hi everyone,
I am trying to use convolution function on embedded system.
In my pc i tried like :
Code:
int i,j,m,n,m1,n1;
int halfH = filter_height >> 1;
int halfW = filter_width >> 1;
for(i = 0; i < height; i++)
{
for(j = 0; j < width; j++)
{
Output[i][j] = 0;
for(m = -halfH,m1 = filter_height-1; m1 >= 0; m++,m1--)
{
if(i + m < 0 || i + m >= height)
continue;
for(n = -halfW,n1 = filter_width-1; n1 >= 0; n++,n1--)
{
if(j + n < 0 || j + n >= height)
continue;
Output[i][j] += input[i+m][j+n] * filter[m1][n1];
}
}
}
}
Here my input image is 240x272 (width X Height) and my filter size is 3 x 3.
I tried the same code in embedded system (arm 9).Here it is taking around 350 ms.
I want to reduce the execution time.
There are a lot of possible optimization but I have some questions before.
Have your filter matrix a fixed 3x3 size ?
What is the numerical range of input array and filter coefficients ?
You need a scaled output in the same range of input array ?
Where is your target time ?
My filter is a sobel filter(1 2 1 , 0 0 0, -1, -2, -1) and input is a gray scale image(0 to 255).
I want to calculate the gradient of image.
I do not have any target time. I want to reduce the time as much as it is possible.
So your previous code compute a single step of sobel filter apply one 3x3 kernel ?
Your coefficient are for Sobel edge-detection along x-direction ?
In any way you can perform some simply optimization (generic 3x3 array)
1) put your image starting at 1,1 in a array having two pixel more the original image, 0 fill first and last row and first and last column.
In this way you can eliminate all the test for the image border saving some time (amply compensate for the additional processing in the border)
2) enrolling loop 3 and 4 and use local variables instead of the filter matrix array
so the code can be similar to:
Code:
short k00=filter[0][0];
short k01=filter[0][1];
short k02=filter[0][2];
short k10=filter[1][0];
short k11=filter[1][1];
short k12=filter[1][2];
short k20=filter[2][0];
short k21=filter[2][1];
short k22=filter[2][2];
short acc=0;
for(i = 0; i < height; i++)
{
for(j = 0; j < width; j++)
{
acc =input[i][j]*k00;
acc+=input[i][j+1]*k01;
acc+=input[i][j+2]*k02;
acc+=input[i+1][j]*k10;
acc+=input[i+1][j+1]*k11;
acc+=input[i+1][j+2]*k12;
acc+=input[i+2][j]*k20;
acc+=input[i+2][j+1]*k21;
acc+=input[i+2][j+2]*k22;
output[i][j]=acc; // if you want again origin in 0,0
}
}
In case of your previous mentioned filter matrix, you can 'hard code' the filter coefficient to gain more time in similar way:
Code:
short acc=0;
for(i = 0; i < height; i++)
{
for(j = 0; j < width; j++)
{
acc =input[i][j];
acc+=(input[i][j+1]<<1);
acc+=input[i][j+2];
acc-=input[i+2][j];
acc-=(input[i+2][j+1]<<1);
acc-=input[i+2][j+2];
output[i][j]=acc; // if you want again origin in 0,0
}
}
(the code is only exemplificative. check coefficient order and sign)
Let me how speed gain you obtain, and if it you need more speed.