Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Advice for large dataset classification

Status
Not open for further replies.

ssulun

Newbie level 5
Newbie level 5
Joined
Jul 31, 2013
Messages
9
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Activity points
121
Hi everyone,

I am an undergrad student taking a grad level course, Pattern Recognition and I need some advice on my project. I am given a database of leaves, total of 3032 samples, belonging to 47 different classes with 2003 features for each sample. The features consist of the following: rectangularity, aspect ratio, mean hue, eccentricity, convexity and 999 features for the FFT magnitudes along x-axis and 999 features for the FFT magnitudes along y-axis. I am using Matlab.

First of all, I know that I need dimensionality reduction. I applied Fisher's Linear Discriminant Analysis to maximize the between cluster variance and minimize within cluster variance. I have received a warning saying that the matrix can be ill-conditioned. So I have decided to apply PCA first and then try LDA again.

Some experienced PhD students said that the dataset can be represented with at most 10 dimensions but I have tried different numbers of principal components and ran Matlab's built-in linear classifier. Here are the results:

pca all.jpg

It appears that 400 PC gives the best result, 100 is also acceptable but 10 is certainly bad. Do I really need 100 features or should I do something extra?

Later I ran the LDA, downloaded from the page https://www.mathworks.com/matlabcentral/fileexchange/29673-lda--linear-discriminant-analysis (explanation is in https://matlabdatamining.blogspot.com.tr/2010/12/linear-discriminant-analysis-lda.html). I am not sure whether this code performs dimensionality reduction though. It helps me to acquire linear scores and when I do classification based on them, I get the exact same result as Matlab's built-in linear classifier. Does it mean Matlab's linear classifier perform some optimal transformation as well? Would it be better if I can somehow reduce the dimensionality while transforming the data, and how can I do it?

Finally I wanted to try Support Vector Machines since I have heard that "most of the time" it performs better than any other classifier. I have used a function named multisvm which simply loops through all the classes, takes the current class as 1 and others 0 and uses Matlab's built in svmtrain function (https://www.mathworks.com/matlabcentral/fileexchange/39352-multi-class-svm/content/multisvm.m).

The results were much worse than I expected. I also tried RBF for kernel and different sigmas but the error rates were still much more than the 10% that I obtained with linear classifier.

svm sigma 1-20.jpg

Again, do you think is SVM way worse in this particular case or am I doing something wrong?

I am open to new ideas as well.

Thanks in advance.
Serkan
 

Can you use self organizing maps + something?
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top