Introduction to Probability & Statistics

Assignment 5, 2025/26

Instructions

Submit your answers to the four questions marked Hand-in. You should upload your solutions to the VLE as a single pdf file. Marks will be awarded for clear, logical explanations, as well as for correctness of solutions.
Solutions to questions marked have been released at the same time as this assignment, in case you want to check your answers or need a hint.
You should also look at the other questions in preparation for your Week 11 seminar.

Starters

These questions should help you to gain confidence with the basics.

S1. You perform 28 independent experiments measuring a random variable \(X\) which you know has mean 457 and variance 676. Use Chebychev’s inequality to give a lower bound on the probability that the mean of your measurements is between 433 and 481.

Answer

First we note that \[ \mathbb{E}\left[\bar{X}_{28}\right]= \mathbb{E}\left[X\right] = 457 \] and \[ \textup{Var}\left(\bar{X}_{28}\right)= \frac{\textup{Var}\left(X\right)}{28}= \frac{676}{28}. \] Also we rewrite \[ \mathbb{P}\left(433 < \bar{X}_{ 28 } < 481 \right)= \mathbb{P}\left(|\bar{X}_{ 28 }- 457 |< 24 \right)=1-\mathbb{P}\left(|\bar{X}_{ 28 }-\mathbb{E}\left[\bar{X}_{ 28 }\right]|\geq 24 \right). \] Chebychev’s inequality tells us that \[ \mathbb{P}\left(|\bar{X}_{ 28 }-\mathbb{E}\left[\bar{X}_{ 28 }\right]|\geq 24 \right)\leq \frac{\textup{Var}\left(\bar{X}_{ 28 }\right)}{( 24 )^2}=\frac{169}{4032}. \] Combining these two equations gives \[ \mathbb{P}\left( 433 < \bar{X}_{ 28 } < 481 \right) \ge 1-\frac{169}{4032}=\frac{3863}{4032}\approx 0.96. \]

S2. Lengths of small snakes are assumed to follow a \(\mbox{\textup{N}}(\mu,\sigma^2)\) distribution. Six snakes were measured, giving the following lengths in cms: 66, 69, 62, 62, 64, 67.

  1. Calculate unbiased estimates for \(\mu\) and \(\sigma^2\).

  2. Calculate a 90% confidence interval for \(\mu\).

Answer
  1. An unbiased estimate for \(\mu\) is \(\hat{\mu} = \bar{x} = 65cm\). An unbiased estimate for \(\sigma^2\) is \(\hat{\sigma^2} = s_n^2 = 8\).

  2. A 90% confidence interval is given by \[ \begin{split} \left(\bar{x}_n - t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}, \, \bar{x}_n + t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}\right) &= \left(65 - t_{5, 0.05}\sqrt{8/6}, \, 65 + t_{5, 0.05}\sqrt{8/6}\right) \\ &= \left(65 - 2.02\sqrt{8/6}, \, 65 + 2.02\sqrt{8/6}\right) \\ &= (62.67,67.33)\,. \end{split} \]

S3. Let \(X_1,\dots,X_n\) be an i.i.d. sample from a distribution with mean \(\mu\) and variance \(\sigma^2\). Let \(\bar{X}_n\) denote the sample mean, and define \[ Z_n = \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \,. \] Check that \(\mathbb{E}\left[Z_n\right] = 0\) and \(\textup{Var}\left(Z_n\right) = 1\).

Answer

We know that \(\mathbb{E}\left[\bar{X}_n\right] = \mu\) and \(\textup{Var}\left(\bar{X}_n\right) =\sigma^2/n\). By linearity of expectation we have \[ \mathbb{E}\left[Z_n\right] = \frac{\sqrt{n}\mathbb{E}\left[\bar{X}_n - \mu\right]}{\sigma} = 0 \,. \] For the variance, we know that for any random variable \(Y\), and constants \(a,b\in\mathbb{R}\), \(\textup{Var}\left(a Y + b\right) = a^2 \textup{Var}\left(Y\right)\). Thus \[ \textup{Var}\left(Z_n\right) = \textup{Var}\left(\sqrt{n}\bar{X}_n/\sigma - \sqrt{n}\mu/\sigma\right) = (\sqrt{n}/\sigma)^2 \textup{Var}\left(\bar{X}_n\right) = 1 \,. \]

S4. Hand-in
Let \(X\) and \(Y\) be discrete random variables with joint mass function as follows: \[p_{X,Y}(0,-1) = p_{X,Y}(0,1) = 1/4\,, \qquad \text{and} \qquad p_{X,Y}(1,0) = 1/2\] (with \(p_{X,Y}(x,y) = 0\) for all other \((x,y)\in\mathbb{R}^2\)).

  1. Calculate \(\textup{Cov}\left[X,Y\right]\).
  2. Are \(X\) and \(Y\) independent?
Answer

The marginal distributions are as follows: \(p_X(0) = p_X(1) = 1/2\); \(p_Y(-1) = p_Y(1) = 1/4\) and \(p_Y(0) = 1/2\). [1 mark]

For the covariance, we calculate \(\mathbb{E}\left[Y\right] = 0\) and \(\mathbb{E}\left[XY\right] = 0\), so \(\textup{Cov}\left[X,Y\right] = 0\). [2 marks] But the two random variables are clearly not independent: e.g. \(p_{X,Y}(1,0) =1/2 \neq (1/2)(1/2) = p_X(1)p_Y(0)\). [2 marks]

(We’ve seen that if two random variables are independent then they are uncorrelated; this example shows that the reverse implication does not hold!)

S5. Let \(X_1,\dots, X_n\) be an i.i.d. sample from the \(\mbox{\textup{N}}(\mu,\sigma^2)\) distribution.

  1. Write down a \(100(1-\alpha)\%\) confidence interval estimator for \(\mu\) in the case where \(\sigma^2\) is known. Hence or otherwise, find a 95% confidence interval for \(\mu\) in the case where \(\sigma^2 = 9\) and a random sample of size 16 has been taken with values \(x_1,\dots, x_{16}\) and it has been found that \[ \sum_{i=1}^{16} x_i = 50, \quad \sum_{i=1}^{16} (x_i - \bar{x}_{16})^2 = 115. \]

  2. It is proposed that, from a second independent random sample of size 16 a 99% confidence interval for \(\mu\) be constructed and that, from a third independent random sample of size 32, a 98% confidence interval for \(\mu\) be constructed. State the probability that neither of these two confidence intervals will contain \(\mu\).

Answer
  1. The confidence interval estimator is \[ \left(\bar{X}_n - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{X}_n + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right). \] With \(\sigma = 3\), \(n = 16\), \(\bar{x}_n = 50/16\) and \(z_{0.025} = 1.96\) substituted into the above expression we get the confidence interval \((1.655, 4.595)\).

  2. The event that the second sample leads to a confidence interval that does not contain \(\mu\) and the event that the third sample leads to a confidence interval that does not contain \(\mu\) are independent because the samples are independent. Thus the probability that both samples lead to a confidence interval that does not contain \(\mu\) is equal to the product of the individual probabilities, \(0.01 \cdot 0.02 = 0.0002\).

Mains

These are important, and cover some of the most substantial parts of the course.

M1. Hand-in
Let \(X_1,\dots,X_{25}\) be an i.i.d. sample from a \(\mbox{\textup{N}}(2,4)\) distribution, and let \(S=X_1+ X_2 + \dots + X_{25}\). Express \(\mathbb{P}\left(S<55\right)\) in terms of the distribution function \(\Phi\) of the standard normal distribution.

Answer

We know that \(\mathbb{E}\left[S\right] = 25\mathbb{E}\left[X\right] = 50\), and \(\textup{Var}\left(S\right) = 25\textup{Var}\left(X\right) = 100\) (because the \(X_i\) are independent, and hence uncorrelated).
Furthermore, we know that the sum of independent normal distributions has a normal distribution. Thus \[ S \sim \mbox{\textup{N}}(50,100) \,. \] [3 marks]

We can normalize to obtain a standard normal random variable \(Z\) by subtracting the mean and dividing by the standard deviation. Thus \[ \mathbb{P}\left(S<55\right) = \mathbb{P}\left(\frac{S-50}{\sqrt{100}}<\frac{55-50}{\sqrt{100}}\right) = \mathbb{P}\left(Z<\frac{5}{10}\right) = \Phi(1/2) \,. \] [2 marks]

M2. Hand-in
Let \(Y_1,Y_2,\dots\) be an i.i.d. sequence of random variables, each with a \(\mbox{\textup{Uniform}}(1,3)\) distribution. Define a new sequence of random variables \(X_1,X_2,\dots\) by \[ X_n = \frac{1}{n}\sum_{i=1}^n Y_i^2 \,. \] Using the Law of Large Numbers, determine the value of \(a\in \mathbb{R}\) for which \(\mathbb{P}\left(\lim_{n\to\infty}X_n = a\right) = 1\).

Answer

The random variable \(X_n\) is just the sample mean of the random variables \(Y_1^2, \dots, Y_n^2\). These are i.i.d. and clearly have finite mean and variance (since \(Y_i^2\) can only take values in the bounded set \([1,9]\)).
The (strong) law of large numbers says that, with probability one, \(X_n\) will converge to \(\mathbb{E}\left[Y^2\right]\). [3 marks]

Finally, we calculate \[ \mathbb{E}\left[Y^2\right] = \int_{1}^3 y^2 \frac{1}{2} dy = 13/3 \,, \] and so the required answer is \(a=13/3\). [2 marks]

M3. Hand-in
Suppose the random variables \(X_1,X_2, X_3\) and \(X_4\) all have the same expectation \(\mu\). For what value(s) of \(b\in\mathbb{R}\) is \[ M=b(X_1+bX_2)+2X_3-3X_4 \] an unbiased estimator for \(\mu\)?

Answer

\(M\) is an unbiased estimator for \(\mu\) if \(\mathbb{E}\left[M\right]=\mu\) for any value of \(\mu\). [1 mark] We find \[ \begin{split} \mathbb{E}\left[M\right]&=\mathbb{E}\left[b(X_1+bX_2)+2X_3-3X_4\right]\\ &=b\mathbb{E}\left[X_1\right]+b^2\mathbb{E}\left[X_2\right]+2\mathbb{E}\left[X_3\right]-3\mathbb{E}\left[X_4\right]\\ &=\mu(b(1+b)+2-3) \end{split} \] [2 marks]
For this to equal \(\mu\) we require \(b(1+b)-1 = 1\), i.e. \(b\in\{-2,1\}\). [2 marks]

M4. Recall that we say that \(T_m\) has a \(t\)-distribution with \(m>1\) degrees of freedom, and write \(T_m\sim t(m)\), if it has density function given by \[ f(x) = k_m\left(1+\frac{x^2}{m}\right)^{-\frac{m+1}{2}}\,, \qquad x\in\mathbb{R}\,, \] where \(k_m\) is a constant that ensures that the density integrates to 1. If \(T_m\sim t(m)\), show that \(\mathbb{E}\left[T_m\right] = 0\).
Hint: you shouldn’t need to explicitly calculate any integrals here!

Answer

We note that \(f(x)\) is an even function: \(f(-x) = f(x)\). Thus \[ \begin{split} \mathbb{E}\left[T_m\right] &= \int_{-\infty}^\infty x f(x) dx \\ &= \int_{-\infty}^0 x f(x) dx + \int_0^\infty x f(x) dx \\ &= \int_0^\infty (-x) f(-x) dx + \int_0^\infty x f(x) dx \\ &= \int_0^\infty (-x) f(x) dx + \int_0^\infty x f(x) dx \\ &= \int_0^\infty ((-x)+x) f(x) dx \\ &= 0\,. \end{split} \]

M5. From a dataset \(x_1,…, x_{10}\) it has been calculated that \[ \sum_{i=1}^{10} x_i = 491, \quad \sum_{i=1}^{10} (x_i - \bar{x}_{10})^2 = 41. \] You model the dataset as a random sample from the normal distribution with mean \(\mu\) and variance \(\sigma^2\).

  1. Assume that both \(\mu\) and \(\sigma^2\) are unknown. Determine a 95% confidence interval for the mean \(\mu\). You can use that \(t_{9, 0.025}\approx 2.26\).

  2. Now assume that it is known that the variance is \(\sigma^2 = 5\). Give a 95% confidence interval for the mean \(\mu\) in this case. You can use that \(z_{0.025} \approx 1.96\).

Answer
  1. Since the variance is unknown, we use the interval \[ \left(\bar{x}_n - t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}, \bar{x}_n + t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}\right) \] We first calculate \(\bar{x}_n = 491/10 = 49.1\) and \[ s_n^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x}_n)^2 = \frac{1}{10-1}41 = \frac{41}{9}. \] We’re told that \(t_{n-1, \alpha/2} = t_{9, 0.025} = 2.26\) and \[ \begin{split} \left(\bar{x}_n - t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}, \bar{x}_n + t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}\right) &\approx \left(49.1- 2.26\frac{\sqrt{41}}{3\sqrt{10}}, 49.1 + 2.26\frac{\sqrt{41}}{3\sqrt{10}}\right)\\ & \approx (47.57, 50.63). \end{split} \]

  2. When \(\sigma^2\) is known, we use the confidence interval \[ \begin{split} \left(\bar{x}_n - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{x}_n + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right) &\approx \left(49.1- 1.96\frac{\sqrt{5}}{\sqrt{10}}, 49.1 + 1.96\frac{\sqrt{5}}{\sqrt{10}}\right)\\ & \approx (47.71, 50.49). \end{split} \]

M6. If \(X\) has expectation \(\mu\) and standard deviation \(\sigma\), the ratio \(r=|\mu|/\sigma\) is called the measurement signal-to-noise-ratio of \(X\). If we define \(D=|(X-\mu)/\mu|\) as the relative deviation of \(X\) from its mean \(\mu\), show that, for \(\alpha>0\), \[ \mathbb{P}\left(D< \alpha\right)\geq 1-\frac{1}{r^2\alpha^2}. \]

Answer

\[ \mathbb{P}\left(D<\alpha\right)=\mathbb{P}\left(|(X-\mu)/\mu|<\alpha\right)=\mathbb{P}\left(|X-\mu|<\alpha r \sigma\right)\,. \]

We now use Chebychev’s inequality: \[ \mathbb{P}\left(|X-\mu|<k\sigma\right) \geq1-\frac{1}{k^2} \] with \(k=\alpha r\), giving \[ \mathbb{P}\left(|X-\mu|<\alpha r \sigma\right)\geq 1-\frac{1}{r^2\alpha^2}. \]

M7. Let \(X\) be the number of 1s and \(Y\) be the number of 2s that occur in \(n\) rolls of a fair die. Use indicator random variables to compute \(\textup{Cov}\left[X,Y\right]\) and \(\rho(X,Y)\).
Hint: this is just like the smarties example covered in lectures.

Answer

We introduce the indicator random variables \[ X_i=\begin{cases}1&\text{ roll $i$ lands on 1}\\0&\text{ otherwise}\end{cases}~~\text{ and }~~ Y_i=\begin{cases}1&\text{ roll $i$ lands on 2}\\0&\text{ otherwise}.\end{cases} \] Then \[ X=\sum_{i=1}^nX_i~~\text{ and }~~Y=\sum_{i=1}^nY_i. \] We observe that \(\mathbb{P}\left(X_i=1\right)=\mathbb{P}\left(Y_i=1\right)=1/6\) and \(\mathbb{P}\left(X_iY_j=1\right)=1/36\) if \(i\neq j\) and \(\mathbb{P}\left(X_iY_i=1\right)=0\). Thus \(\mathbb{E}\left[X_i\right]=\mathbb{E}\left[Y_i\right]=1/6\) and \(\mathbb{E}\left[X_iY_j\right]=1/36\) if \(i\neq j\) and \(\mathbb{E}\left[X_iY_i\right]=0\). We calculate the covariances: \[ \textup{Cov}\left[X_i,Y_j\right]=\mathbb{E}\left[X_iY_j\right]-\mathbb{E}\left[X_i\right]\mathbb{E}\left[Y_j\right]=\begin{cases}\frac{1}{36}-\frac16\cdot\frac16=0&\text{ if } i\neq j\\ -\frac16\cdot\frac16=\frac{-1}{36}&\text{ if }i=j.\end{cases} \] Now, as we’ve seen with the smarties example in lectures, \[ \begin{split} \textup{Cov}\left[X,Y\right]&=\textup{Cov}\left[\sum_{i=1}^nX_i,\sum_{j=1}^nY_j\right] =\sum_{i=1}^n\sum_{j=1}^n\textup{Cov}\left[X_i,Y_j\right] \\ &=\sum_{i=1}^n\textup{Cov}\left[X_i,Y_i\right]=\sum_{i=1}^n\frac{-1}{36}=-\frac{n}{36}. \end{split} \] To calculate the variance of \(X\) we recall that the variance of a sum of independent random variables equals the sum of their variances. Thus \[ \textup{Var}\left(X\right)=\textup{Var}\left(\sum_{i=1}^n X_i\right)=\sum_{i=1}^n\textup{Var}\left(X_i\right)=\sum_{i=1}^n\frac{5}{36}=\frac{5n}{36}. \] By the same calculation \(\textup{Var}\left(Y\right)\) has the same value. (We could also have used the fact that \(X\) and \(Y\) each have a \(\mbox{\textup{Bin}}(n,1/6)\) distribution.) Thus \[ \rho(X,Y)=\frac{\textup{Cov}\left[X,Y\right]}{\sqrt{\textup{Var}\left(X\right)\textup{Var}\left(Y\right)}}=\frac{-n/36}{5n/36}=-\frac15. \] It makes sense that the correlation coefficient should be negative, because if in \(n\) dice rolls I get an unusually high number of 1s, then the chance of getting many 2s is smaller (because all those 1s are definitely not 2s).

M8. In the textbook you should have read the proof of Chebychev’s inequality for the case of a continuous random variable. Provide a similar proof for the case of a discrete random variable.

Answer

First we find a lower bound for the variance of \(X\). Let’s write \(\mathbb{E}\left[X\right]=\mu\).

\[ \begin{split} \textup{Var}\left(X\right)&=\mathbb{E}\left[(X-\mu)^2)\right] =\sum_{x\in X(\Omega)}(x-\mu)^2p_X(x) \\ &\geq \sum_{\stackrel{x\in X(\Omega)}{|x-\mu|\geq a}}(x-\mu)^2 p_X(x) \\ &\geq \sum_{\stackrel{x\in X(\Omega)}{|x-\mu|\geq a}}a^2 p_X(x) =a^2P(|X-\mu|\geq a). \end{split} \] Then we divide both sides of the inequality by \(a^2\) to get \[ \mathbb{P}\left(|X-\mathbb{E}\left[X\right]|\geq a\right)\leq \frac{1}{a^2}\textup{Var}\left(X\right). \]

M9. Assume that in Example 17.1 from the lectures (measuring a ball rolling down an inclined plane), we choose to stop the ball always after one time unit, so that \[ X_i=\frac12 a\,(1+U_i)^2+V_i, \] where the independent errors are normally distributed with \(U_i\sim\mbox{\textup{N}}(0,\sigma_U^2),\,V_i\sim\mbox{\textup{N}}(0,\sigma_V^2)\). Assume the variances of the errors are known. Calculate the bias of the estimator \(A=2\bar{X}_n\) for the acceleration parameter \(a\). Propose an unbiased estimator for \(a\).

Answer

For the bias we calculate \[ \mathbb{E}\left[A\right]=\mathbb{E}\left[2\bar{X}_n\right] = 2\mathbb{E}\left[\bar{X}_n\right] = 2\mathbb{E}\left[X_i\right]. \] So we need the expectation of \(X_i\): \[ \begin{split} \mathbb{E}\left[X_i\right]&=\mathbb{E}\left[\frac 12 a(1+U_i)^2+V_i\right]=\frac12 a \mathbb{E}\left[(1+U_i)^2\right]+\mathbb{E}\left[V_i\right]\\ &=\frac12 a \left(\textup{Var}\left(1+U_i\right)+\mathbb{E}\left[1+U_i\right]^2\right) =\frac12 a \left(\textup{Var}\left(U_i\right)+1^2\right)\\ &=\frac12 a \left(\sigma_U^2+1\right). \end{split} \] This gives \[ \mathbb{E}\left[A\right]= 2\mathbb{E}\left[X_i\right] = a (\sigma_U^2+1)\neq a. \] So this estimator is not unbiased. Taking the average is going to consistently overestimate the value of \(a\).

Luckily we can fix this by rescaling the estimator. The estimator \[ \tilde{A}=\frac{1}{\sigma_U^2+1}A=\frac{2}{\sigma_U^2+1}\bar{X}_n \] is unbiased.

Desserts

Still hungry for more? Try these if you want to push yourself further. (These are mostly harder than I’d expect you to answer in an exam, or involve non-examinable material.)

D1. Consider the following dataset of lifetimes of ball bearings in hours: \[ \begin{split} &6278~~ 3113~~ 5236~~ 11584~~ 12628~~ 7725~~ 8604~~ 14266~~ 6125~~ 9350\\ &3212~~ 9003~~ 3523~~ 12888~~ 9460~~ 13431~~ 17809~~ 2812 ~~ 11825 ~~ 2398. \end{split} \] Suppose that we are interested in estimating the minimum lifetime of this type of ball bearing. The dataset is modelled as a realization of a random sample \(X_1 , \dots , X_n\). Each random variable \(X_i\) is represented as \(X_i=\delta +Y_i\), where \(Y_i\) has an \(\mbox{\textup{Exp}}(\lambda)\) distribution and \(\delta > 0\) is an unknown parameter that is supposed to model the minimum lifetime. The objective is to construct an unbiased estimator for \(\delta\). It is known that \[ \mathbb{E}\left[M_n\right]=\delta+\frac{1}{n\lambda}~\text{ and }~\mathbb{E}\left[\bar{X}_n\right]=\delta+\frac{1}{\lambda}, \] where \(M_n = \min(X_1 , \dots , X_n)\) and \(\bar{X}_n=(X_1 + \cdots + X_n)/n\).

  1. Check whether \[ T=\frac{n}{n-1}\left(\bar{X}_n-M_n\right) \] is an unbiased estimator for \(1/\lambda\).

  2. Construct an unbiased estimator \(D\) for \(\delta\).

  3. Use the dataset to compute an estimate for the minimum lifetime \(\delta\).

Answer
  1. By linearity of expectation \[ \begin{split} \mathbb{E}\left[T\right]&=\frac{n}{n-1}\left(\mathbb{E}\left[\bar{X}_n\right]-\mathbb{E}\left[M_n\right]\right)\\ &=\frac{n}{n-1}\left(\left(\delta+\frac{1}{\lambda}\right)-\left(\delta+\frac{1}{n\lambda}\right)\right)\\ &=\frac{n}{n-1}\frac{n-1}{n\lambda}=\frac{1}{\lambda}. \end{split} \] This shows that \(T\) is an unbiased estimator for \(1/\lambda\).

  2. We will look for a linear combination of \(\bar{X}_n\) and \(M_n\) of which the expectation is \(\delta\). From the expressions for \(\mathbb{E}\left[\bar{X}_n\right]\) and \(\mathbb{E}\left[M_n\right]\) we see that we can eliminate \(\lambda\) by subtracting \(\mathbb{E}\left[\bar{X}_n\right]\) from \(n\mathbb{E}\left[M_n\right]\). Therefore, first consider \(nM_n-\bar{X}_n\), which has expectation \[ \mathbb{E}\left[nM_n-\bar{X}_n\right]=n\mathbb{E}\left[M_n\right]-\mathbb{E}\left[\bar{X}_n\right]=n\left(\delta+\frac{1}{n\lambda}\right)-\left(\delta+\frac{1}{\lambda}\right)=(n-1)\delta. \] This means that \[ D=\frac{nM_n-\bar{X}_n}{n-1} \] has expectation \(\delta\): \(\mathbb{E}\left[D\right]=\mathbb{E}\left[nM_n-\bar{X}_n\right]/(n-1)=\delta,\) so that \(D\) is an unbiased estimator for \(\delta\).

  3. When we evaluate \(\bar{X}_n\) and \(M_n\) on the dataset with \(n=20\) values, we find \[ \bar{x}_n= 8563.5,~~~~m_n=2398. \] Substituting these values into the estimator determined in part b) gives the following estimate for \(\delta\): \[ \frac{nm_n-\bar{x}_n}{n-1}=\frac{20\cdot 2398- 8563.5}{19}=2073.5. \]

D2. Let \(M_n\) be the maximum of \(n\) independent \(\mbox{\textup{Uniform}}(0,1)\) random variables. Show that for any fixed \(\varepsilon>0\), \[ \lim_{n\to\infty} \mathbb{P}\left(|M_n-1|> \varepsilon\right) = 0 \,. \]

Answer

Let \(X_1,\dots,X_n\) denote the \(n\) independent \(\mbox{\textup{Uniform}}(0,1)\) random variables. \(M_n\) can only take values in the set \((0,1)\), so the event \(\{|M_n-1|>\varepsilon\} = \{M_n < 1-\varepsilon\}\). Now note that for any \(x\in(0,1)\), \(\{M_n<x\}\) occurs if and only if \(\{X_i<x\}\) for all of the \(n\) uniform random variables. Since the \(X_i\) are independent, we get \[ \mathbb{P}\left(M_n< 1-\varepsilon\right) = \mathbb{P}\left(X<1-\varepsilon\right)^n = (1-\varepsilon)^n\,, \] where in the last line we have used the distribution function for a \(\mbox{\textup{Uniform}}(0,1)\) distribution to calculate \(\mathbb{P}\left(X<1-\varepsilon\right)\). This tends to 0 as \(n\to\infty\), as required.

D3. (A more general law of large numbers, see Exercise 13.12 in the textbook).
Let \(X_1,X_2,\dots\) be a sequence of independent random variables with \(\mathbb{E}\left[X_i\right]=\mu_i\) and \(\textup{Var}\left(X_i\right)=\sigma_i^2\) for \(i=1,2,\dots\). Let \(\bar{X}_n=(X_1+\cdots +X_n)/n\). Suppose that there exists an \(M\in\mathbb{R}\) such that \(0<\sigma_i^2\leq M\) for all \(i\), and let \(a\) be an arbitrary positive number.

  1. Apply Chebychev’s inequality to show that \[ \mathbb{P}\left(\left|\bar{X}_n-\frac{1}{n}\sum_{i=1}^n\mu_i\right|>a\right)\leq\frac{\textup{Var}\left(X_1\right)+\cdots +\textup{Var}\left(X_n\right)}{n^2a^2}. \]

  2. Conclude from a) that \[ \lim_{n\to\infty}\mathbb{P}\left(\left|\bar{X}_n-\frac{1}{n}\sum_{i=1}^n\mu_i\right|>a\right)=0. \]

  3. Check that the weak law of large numbers is a special case of this result.

Answer
  1. First we calculate the expectation and variance of \(\bar{X}_n\): \[ \begin{split} \mathbb{E}\left[\bar{X}_n\right]&=\mathbb{E}\left[(X_1+\cdots X_n)/n\right]\\ &=\left(\mathbb{E}\left[X_1\right]+\cdots+ \mathbb{E}\left[X_n\right]\right)/n \text{ by linearity of expectation}\\ &=(\mu_1+\cdots+ \mu_n)/n=\frac{1}{n}\sum_{i=1}^n\mu_i,\\ \textup{Var}\left(\bar{X}_n\right)&=\textup{Var}\left((X_1+\cdots +X_n)/n\right)\\ &=\textup{Var}\left(X_1/n\right)+\cdots +\textup{Var}\left(X_n/n\right) \text{ using independence of the $X_i$}\\ &=\frac{\textup{Var}\left(X_1\right)+\cdots +\textup{Var}\left(X_n\right)}{n^2}. \end{split} \]

\[ \begin{split} \mathbb{P}\left(\left|\bar{X}_n-\frac{1}{n}\sum_{i=1}^n\mu_i\right|>a\right)&=\mathbb{P}\left(\left|\bar{X}_n-\mathbb{E}\left[\bar{X}_n\right]\right|>a\right)\\ &\leq \mathbb{P}\left(\left|\bar{X}_n-\mathbb{E}\left[\bar{X}_n\right]\right|\geq a\right)\\ &\leq \frac{1}{a^2}\textup{Var}\left(\bar{X}_n\right) \text{ by Chebychev}\\ &=\frac{\textup{Var}\left(X_1\right)+\cdots +\textup{Var}\left(X_n\right)}{n^2a^2}. \end{split} \]

  1. On the right-hand side of the inequality from part a) we can use that \(\textup{Var}\left(X_i\right)=\sigma_i^2\leq M\) for all \(i\) and hence \[ \frac{\textup{Var}\left(X_1\right)+\cdots +\textup{Var}\left(X_n\right)}{n^2a^2}\leq\frac{nM}{n^2a^2}=\frac{M}{na^2}. \] We now take the limit \(n\to\infty\) on both sides of the inequality \[ \mathbb{P}\left(\left|\bar{X}_n-\frac{1}{n}\sum_{i=1}^n\mu_i\right|>a\right)\leq\frac{M}{na^2} \] and use that \(\lim_{n\to\infty}M/(na^2)=0\) to obtain \[ \lim_{n\to\infty}\mathbb{P}\left(\left|\bar{X}_n-\frac{1}{n}\sum_{i=1}^n\mu_i\right|>a\right)\leq 0. \] But we also know that probabilities are always non-negative, and the limit of a sequence of non-negative numbers is also non-negative, so the limit must be zero.

  2. The law of large numbers has the same assumptions as the statement in this question, with the additional requirement that all the \(X_i\) are identically distributed so that in particular they all have the same expectation \(\mu_i=\mu\). Thus the average of all the \(\mu_i\) is also \(\mu\) and the statement from part b) becomes \[ \lim_{n\to\infty}\mathbb{P}\left(\left|\bar{X}_n-\mu\right|>a\right)= 0 \] for any \(a>0\). This is the statement of the weak law of large numbers.