# HW3 (Due 10/27)

1. (4 points) Consider two distributions $$p=[0.199, 0.599, 0.001, 0.199, 0.001, 0.001]$$ and $$q=[0.1, 0.3, 0.1, 0.1, 0.3, 0.1]$$. Compute $$KL(p||q)$$ and $$KL(q||p)$$. Note that they are significantly different from one another. You may also want to check out this tweet to get some more insight.

2. (6 points) For a group of students with weight $$W$$ and height $$H$$ are jointly normally distributed with mean of 50 kg and 160 cm and a covariance matrices of $$E\left[\begin{matrix} (W-\bar{W})^2 & (W-\bar{W})(H-\bar{H}) \\ (H-\bar{H})(W-\bar{W}) & (H-\bar{H})^2 \end{matrix} \right]=\begin{pmatrix} 80 & 30 \\ 30 & 140 \end{pmatrix}$$.

1. Compute $$h(W)$$

2. Compute $$h(W|H)$$

3. how many bits on average will be needed per sample to store a student's weight with precision of 0.1 if his height is known?

3. (10 points) Consider Problem 1 in last HW again, compute

1. $$H(D|X_1)$$

2. $$H(D|X_1,X_2)$$

3. $$I(X_1;D)$$

4. $$I(X_1;D|X_2)$$

5. $$I(X_1;X_2)$$

4. Extra credit. We will study a bit more the Shannon-Fano-Elias code briefly described in class. Recall that any SFE codeword can be viewed as an interval between 0 and 1. For example, $$1011$$ can be viewed as $$[0.1011b,0.101\dot1b]=[0.1011b,0.11b]=[0.6875,0.75]$$.

1. (5 points) Show that two codewords cannot be prefix of one another if their corresponding interval does not overlapped.

2. (5 points) Consider a usual setup that we split the $$[0,1]-$$interval according to the probabilities of the alphabet. And then we generate the codeword of a symbol out of the mid point of the corresponding interval of symbol. For example, if we only have two symbols $$a$$ and $$b$$ with probability 0.4 and 0.6. $$a$$ should take $$0.2=0.00110011\cdots b$$, where, say, we can pick a codeword $$00$$, which correspond to interval $$[0.00b, 0.00\dot 1b]=[0.00b, 0.01b]$$. Or we can also pick $$001$$, which corresponds to interval $$[0.001b, 0.001\dot1 b]=[0.001b,0.01b]$$. Note that the interview shrink by half whenever the length of the codeword increase by one. Question: what is the minimum length of the codeword needed for a symbol $$a$$ with probability $$p(a)$$? Try to represent this minimum length in terms of $$p(a)$$.

3. (5 points) From the previous part, find an upper bound for the expected length of a codeword in terms of the entropy of the source. And show rigorously with a proof that your upper bound holds for any situation.