Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Entropy of information source

Status
Not open for further replies.

iVenky

Advanced Member level 2
Advanced Member level 2
Joined
Jul 11, 2011
Messages
584
Helped
37
Reputation
76
Reaction score
35
Trophy points
1,318
Location
College Station, Texas
Activity points
6,124
In the paper "A Mathematical theory of Communication" by Shannon I couldn't understand a theorem (Theorem 4) under the topic "The Entropy of Information Source"-
Consider again the sequences of length N and let them be arranged in order of decreasing probability. We define n(q) to be the number we must take from this set starting with the most probable one in order to accumulate a total probability q for those taken

Theorem 4:
\[
\lim_{N \to +\infty} \frac{log (n(q))}{N}= H
\]


when q does not equal 0 or 1.
We may interpret log n(q) as the number of bits required to specify the sequence when we consider only the most probable sequences with a total probability q.
Here 'H' is the entropy.


Can you explain me how we get this theorem?
What is this n(q)? I couldn't understand the sentence that I have underlined. How is log n(q) number of bits required to specify the sequence?

Here's his paper-
**broken link removed**

It's under the topic "The Entropy of an Information Source" in Page 13 and 14. He has written the proof for that in the appendix but I couldn't understand that.


thanks a lot
 

Dude,

this used the AEP theorem to prove. In fact, it is a weakly AEP.

What it means is that when your sequence is approaching to infinite long, then to encode this sequence you need at least nH bits for it.
 
  • Like
Reactions: iVenky

    iVenky

    Points: 2
    Helpful Answer Positive Rating
Dude,

this used the AEP theorem to prove. In fact, it is a weakly AEP.

What it means is that when your sequence is approaching to infinite long, then to encode this sequence you need at least nH bits for it.

Thanks. What is that 'n' in that 'nH' bits? Is that number of sequences? But I don't understand these "most probable sequence" as he says in that paper. Is it applicable only to those ? What exactly is that n(q)?

thanks once again.
 

say the sequence you got is made of n random variables, x1, x2, ... , Xn
all these variables x has a prob distr, and here we make them to be identical independt, which means they all have the same prob distr as p(x)
now you should notice that you need n*log(x) bits at least to represent a sequence,

but as Shannon said and proved, you only need n*H(x) bits to encode.

In that way, you got the data compressed. You should notice that H(x) is smaller equal than log(x).

n(q) is the sequnce you really compressed, that is to say, n(q) is the subset of all your original permutation sequence.

In fact, these sequence is called typical sequence. Such typical sequence has the property that called "most probable".

Hope this helps somehow.

Thanks. What is that 'n' in that 'nH' bits? Is that number of sequences? But I don't understand these "most probable sequence" as he says in that paper. Is it applicable only to those ? What exactly is that n(q)?

thanks once again.

- - - Updated - - -
 
Last edited:
  • Like
Reactions: iVenky

    iVenky

    Points: 2
    Helpful Answer Positive Rating
say the sequence you got is made of n random variables, x1, x2, ... , Xn
all these variables x has a prob distr, and here we make them to be identical independt, which means they all have the same prob distr as p(x)
now you should notice that you need n*log(x) bits at least to represent a sequence,

but as Shannon said and proved, you only need n*H(x) bits to encode.

In that way, you got the data compressed. You should notice that H(x) is smaller equal than log(x).

n(q) is the sequnce you really compressed, that is to say, n(q) is the subset of all your original permutation sequence.

In fact, these sequence is called typical sequence. Such typical sequence has the property that called "most probable".

Hope this helps somehow.



- - - Updated - - -


Now I understood something. Thanks a lot for your reply. I have some few doubts

You said n*log(x)- Here 'x' actually indicates the total number of possibilities that x can take right? I mean the size of the sample space of x right?

Also you are telling that n(q) indicates the number of samples with highest probability and according the paper they are those with probability 'q' right?

From the result it looks more like log n(q)=NH which means where are the rest of those less probable elements which don't come under n(q)?

Thanks a lot
 

x belongs to a set, e.g. x : {1 2 3}

due to lack of other formation of x when i input here, i used x, but i mean it is: log(alphbet x) = log(3), followed by the above example.

here p(x) = 1/3 if x =1
1/3 if x =2
1/3 if x =3
note that you can assign all other probs, but the above gives you the max H(x) = log(3)

the first log(3) is smaller or equal than H(x), this is what i mean for your first question.

Then, number of typical sequce is n(q), you have log(n(q)) approaches to nH(x), n(q) is those most probable sequce of all the permutations.

Hope this is better.
 
  • Like
Reactions: iVenky

    iVenky

    Points: 2
    Helpful Answer Positive Rating
Ya that's my doubt. If you consider n(q) as the most probable sequence of all the permutations what you are coming to say is that NH(x) bits are enough to represent those most probable sequences alone. But you haven't represented sequences (say least probable sequences) that don't come under most probable sequences ? If you include them also the number of bits would increase right?

Thanks a lot for your help.

- - - Updated - - -

I figured that out!

Actually he had proved in his paper that (I can understand this derivation. It's in his paper)

\[

\lim_{N \to \infty }\frac{log(p^{-1})}{N} = H

\]

where p is the total probability of the sequence of length N. It looks like as N-> infinity, all have the same probability 'p'. This can be easily understood if you see that derivation.

Now if you define n(q) as the number of sequences till the summation of those probabilities become equal to 'q' then we have

n(q) * p= q (we have assumed that as N-> infinity all have the same probability 'p'. If you see that derivation I had told in his paper you will get this)

We know that the upper bound of q is 1

so clearly

n(q)*p=1

which means

n(q)=1/p =p-1

When you substitute this in the above expression you will get the result !

What I understand from this is that when you have an ergodic process (with a Markov chain kind of thing) then as N-> infinity some sequences won't occur at all so that you need not worry about them. This also means that the number of bits required to represent them is not log (total sample space) but rather just NH.

Thanks a lot for your help.
 
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top