How to find the covariance of a random vector in MATLAB

Status
Not open for further replies.

David83

Advanced Member level 1
Joined
Jan 21, 2011
Messages
410
Helped
45
Reputation
92
Reaction score
45
Trophy points
1,308
Visit site
Activity points
3,639
Hello all,

How to find the covariance matrix of a random vector by simulations in MATLAB

Thanks in advance
 

You just use the cov() function.

But, if you're trying to find the covariance of one vector, that is, one vector containing samples of one variable, matlab is going to give you the variance. Covariance is a statistic used for bivariate relationships, which is fancy wording for "comparing two variables."
 

So, I need to create a matrix and then use the cov() function?
 

Well, you don't have to. Can I ask what kind of data you're trying to interpret?

Here's an example that may explain things better. Let's say I have some time samples of a signal, y, which is a sine wave and a signal, g, which is a sine wave that is shifted by pi/8. If I want to check the covariance of these signals, that is, if I want to see if they "vary together," then I could take a vector containing time-samples of y and a vector containing time samples of g, and use the function cov(y,g).

Let's try this:

Code:
t = 0:0.5:8; %create a vector called t that starts at zero, and increments in steps of .5 until it reaches 8.
y = sin(t);              %create a vector whose values are the sine of each value in t. 
g = sin(t+pi/8);      %create a vector whose values are the sine of each value in t plus pi/8

plot(g);
hold on
plot(y);

Now we should have a graph that looks like the attached figure.



To check the covariance of y and g, type:

Code:
Cv = cov(y,g)

This gives us the following covariance matrix:

Code:
Cv =

    0.5113    0.4829
    0.4829    0.5297


Where the top-left cell is the variance of y, the bottom-right cell is the variance of g, and the bottom-left and top-right cells are the covariance of y and g. Alternatively, you could create a matrix out of y and g where each column is a variable and each row is an observation, like this:


Code:
m(:,1) = y; %create a matrix and set its first column to equal our y values
m(:,2) = g; %set its second column to equal our g values
cov(m) %compute the covariance of m

ans =

    0.5113    0.4829
    0.4829    0.5297

You'll notice these yield the same results.
 
Last edited:

OK, that is good. I need to find the covariance matrix of a noise vector. Now I have the covariance matrix mathematically, but I want to make sure my derivation is correct. The noise vector is Gaussian with zero mean. So, based on what you said I can do the following:

Code:
noise=zeros(N,2);%The noise vector size is N-by-1
for ii=1:2
n=%generate the noise
noise(:,ii)=n;
NoiseCov=cov(noise);
end

Is that right?

Then how can I compare the covariance matrices in an easy way? Because N=1024 in my case, and it is hard to compare them element by element.
 

Sure, you could do that.

And because you're taking the covariance an array with 2 columns in your example, your covariance matrix will be a 2x2 matrix so it wouldn't be hard to analyze.

For example:

Code:
N = 1024;
noise = zeros(N,2);
for ii = 1:2
    n = wgn(N,1,1); %creates a Nx1 vector of zero-mean white gaussian noise
    noise(:,ii) = n;
end
NoiseCov = cov(noise)

This would yield something similar to:

Code:
NoiseCov =

    1.1904   -0.0152
   -0.0152    1.2064

Where the top-left is the variance of the first noise column, the bottom-right is the variance of the second noise column and the bottom-left and top-right is their covariance.

In this case, their covariance is nearly zero. That means that there isn't any sort of linear relationship between the two. This is logically satisfying, because they're both just white gaussian noise.

If you plotted one of the noise columns over the other, you could get a scatter plot that would verify by inspection that no linear relationship exists, i.e. -- a covariance of zero. For example, after executing the above code, type:

Code:
plot(noise(:,1),noise(:,2))

Which should look similar to this:



If the data had a strong covariance (for example, of positive 1) that scatter plot would look like a straight line rather than a rats-nest of data.
 

I am sorry, but the covariance matrix of a noise vector that is N-by-1 is N-by-N. So why the covariance matrix here is 2-by-2? And is there any better way of comparing the matrices? Someone suggests me using 3-D plot, but I am not sure how to do it.

Thanks for your responses.
 

I am sorry, but the covariance matrix of a noise vector that is N-by-1 is N-by-N.

David, can you provide a source that supports that statement? As I understand it, the covariance matrix has MxM dimensions, where M = number of variables, not the number of samples in each variable.

So, the covariance matrix of variables x and y is then:



Where Cov(x,y) =

\[\hat{\sigma}_{xy} = \frac{1}{N-1} \sum_{n = 1}^N ({x }_{i} - {m}_{x})*({y}_{i}-{m}_{y})\]
(from: "Introduction to Applied Statistical Signal Analysis" by Richard Shiavi.)


Here is a video that explains the covariance matrix as I understand it: https://youtu.be/locZabK4Als?t=10m10s
 

What is the difference between number of variables and the number of samples? If you have an N-by-1 random vector, i.e., a vector of N random variables, then the covariance matrix is defined as:

\[\mathbf{C}=\mathbb{E}\begin{pmatrix}X_1\\X_2\\\vdots\\X_N\end{pmatrix}\begin{pmatrix}X_1^*&X_2^*&\cdots& X_N^*\end{pmatrix}\]

Which is N-by-N matrix. See the link here .
 

Ah, I believe I see where you're confused now. A random vector isn't a vector filled with random samples. A random vector is a vector filled with random variables! [ref]


A variable is the process or phenomenon from which you're recording samples. A sample is like a single observation of one variable. Let's say you want to check the covariance of ice-cream sales and shark attack reports on the west coast. You could measure the reported shark attacks each month of the year and record each measurement in a vector. You could also record the number of ice creams sold each month and form another vector. So that's two variables, with twelve samples each. For example:


SA = [1, 0, 0, 0, 1, 2, 5, 5, 2, 0, 0, 0]; (Shark attacks reported jan through dec)

and

ICS = [100, 500, 300, 20, 30, 40, 900, 1000, 50, 20, 0, 0]; (ice-cream sales per month (jan through dec))


So, in the equation that you defined here:


N = 2, X1 = SA, and X2 =ICS.


In that wikipedia article you posted, they describe this by saying:

If the entries in the column vector[/I]

are random variables, each with finite variance, then the covariance matrix...


So, to finish our examlple,

Code:
CvMatrix = cov(SA,ICS)
ans =

    3.5152    5.0303
    5.0303   12.6288

The covariance between them is a positive number (5.0303). This means there is a positive linear relationship between the variables. I.E. -- when one goes up, the other goes up, when one goes down, the other goes down. To check the "strength" of this relationship, our next step would be to calculate their correlation.

Hope this helped.
 

When we take different samples from a random process these samples are called random variables. Mathematically, the covariance matrix which I have is N-by-N. So, to find the covariance matrix by simulations now, is it still the same procedure as you mentioned?
 

If you have N random variables that belong to a family (ensemble) of random variables, which describe a random process, then yes, your covariance matrix will be NxN and you will use the process that we discussed above.


It's just that, based on your description of the noise vector, it sounds like you're wanting to sample the noise 1024 times by recording the value of the signal at 1024 fixed time intervals. Which would yield 1024 constant values. And if they're constant values, they will each have a variance of zero. So treating each sample as a random variable won't yield a meaningful covariance.


I suppose you could take N frames of a noise vector, each frame containing 1024 samples, and compute an NxN covariance matrix from each frame like this:

Code:
nSamples= 1024;
noise = zeros(nSamples,1);

nFrames=10;

for ii = 1:nFrames
    x = wgn(nSamples,1,1);
    noise(:,ii) = x;
end

NoiseCov = cov(noise);

surf(NoiseCov);



Is that what you're looking for? If I'm misunderstanding let me know, I haven't been doing this stuff too long but I find it pretty interesting. The "ridge" that goes diagonally in that surface plot represents the variance of each frame (they're all pretty similar) and the valleys that surround it would represent their covariance -- which are all near zero.
 
Last edited:

OK, now that makes more sense. Let me check this on my case, and I will let you know what I'll get. Thanks
 

I have been distracted doing something else in the last period, but when I tried this in my system, where the noise is complex, there was an error saying the the matrices in the surf function couldn't be complex, so, I plot the abs() of the covariance matrix. Is this the correct way to handle the syntax error?
 

Status
Not open for further replies.

Similar threads

Cookies are required to use this site. You must accept them to continue using the site. Learn more…