Introduction to Probability & Statistics

Assignment 5, 2025/26

Instructions

Submit your answers to the four questions marked Hand-in. You should upload your solutions to the VLE as a single pdf file. Marks will be awarded for clear, logical explanations, as well as for correctness of solutions.
Solutions to questions marked have been released at the same time as this assignment, in case you want to check your answers or need a hint.
You should also look at the other questions in preparation for your Week 11 seminar.

Starters

These questions should help you to gain confidence with the basics.

S1. You perform 28 independent experiments measuring a random variable \(X\) which you know has mean 457 and variance 676. Use Chebychev’s inequality to give a lower bound on the probability that the mean of your measurements is between 433 and 481.

S2. Lengths of small snakes are assumed to follow a \(\mbox{\textup{N}}(\mu,\sigma^2)\) distribution. Six snakes were measured, giving the following lengths in cms: 66, 69, 62, 62, 64, 67.

  1. Calculate unbiased estimates for \(\mu\) and \(\sigma^2\).

  2. Calculate a 90% confidence interval for \(\mu\).

  1. An unbiased estimate for \(\mu\) is \(\hat{\mu} = \bar{x} = 65cm\). An unbiased estimate for \(\sigma^2\) is \(\hat{\sigma^2} = s_n^2 = 8\).

  2. A 90% confidence interval is given by \[ \begin{split} \left(\bar{x}_n - t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}, \, \bar{x}_n + t_{n-1, \alpha/2}\frac{s_n}{\sqrt{n}}\right) &= \left(65 - t_{5, 0.05}\sqrt{8/6}, \, 65 + t_{5, 0.05}\sqrt{8/6}\right) \\ &= \left(65 - 2.02\sqrt{8/6}, \, 65 + 2.02\sqrt{8/6}\right) \\ &= (62.67,67.33)\,. \end{split} \]

S3. Let \(X_1,\dots,X_n\) be an i.i.d. sample from a distribution with mean \(\mu\) and variance \(\sigma^2\). Let \(\bar{X}_n\) denote the sample mean, and define \[ Z_n = \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \,. \] Check that \(\mathbb{E}\left[Z_n\right] = 0\) and \(\textup{Var}\left(Z_n\right) = 1\).

We know that \(\mathbb{E}\left[\bar{X}_n\right] = \mu\) and \(\textup{Var}\left(\bar{X}_n\right) =\sigma^2/n\). By linearity of expectation we have \[ \mathbb{E}\left[Z_n\right] = \frac{\sqrt{n}\mathbb{E}\left[\bar{X}_n - \mu\right]}{\sigma} = 0 \,. \] For the variance, we know that for any random variable \(Y\), and constants \(a,b\in\mathbb{R}\), \(\textup{Var}\left(a Y + b\right) = a^2 \textup{Var}\left(Y\right)\). Thus \[ \textup{Var}\left(Z_n\right) = \textup{Var}\left(\sqrt{n}\bar{X}_n/\sigma - \sqrt{n}\mu/\sigma\right) = (\sqrt{n}/\sigma)^2 \textup{Var}\left(\bar{X}_n\right) = 1 \,. \]

S4. Hand-in
Let \(X\) and \(Y\) be discrete random variables with joint mass function as follows: \[p_{X,Y}(0,-1) = p_{X,Y}(0,1) = 1/4\,, \qquad \text{and} \qquad p_{X,Y}(1,0) = 1/2\] (with \(p_{X,Y}(x,y) = 0\) for all other \((x,y)\in\mathbb{R}^2\)).

  1. Calculate \(\textup{Cov}\left[X,Y\right]\).
  2. Are \(X\) and \(Y\) independent?

S5. Let \(X_1,\dots, X_n\) be an i.i.d. sample from the \(\mbox{\textup{N}}(\mu,\sigma^2)\) distribution.

  1. Write down a \(100(1-\alpha)\%\) confidence interval estimator for \(\mu\) in the case where \(\sigma^2\) is known. Hence or otherwise, find a 95% confidence interval for \(\mu\) in the case where \(\sigma^2 = 9\) and a random sample of size 16 has been taken with values \(x_1,\dots, x_{16}\) and it has been found that \[ \sum_{i=1}^{16} x_i = 50, \quad \sum_{i=1}^{16} (x_i - \bar{x}_{16})^2 = 115. \]

  2. It is proposed that, from a second independent random sample of size 16 a 99% confidence interval for \(\mu\) be constructed and that, from a third independent random sample of size 32, a 98% confidence interval for \(\mu\) be constructed. State the probability that neither of these two confidence intervals will contain \(\mu\).

Mains

These are important, and cover some of the most substantial parts of the course.

M1. Hand-in
Let \(X_1,\dots,X_{25}\) be an i.i.d. sample from a \(\mbox{\textup{N}}(2,4)\) distribution, and let \(S=X_1+ X_2 + \dots + X_{25}\). Express \(\mathbb{P}\left(S<55\right)\) in terms of the distribution function \(\Phi\) of the standard normal distribution.

M2. Hand-in
Let \(Y_1,Y_2,\dots\) be an i.i.d. sequence of random variables, each with a \(\mbox{\textup{Uniform}}(1,3)\) distribution. Define a new sequence of random variables \(X_1,X_2,\dots\) by \[ X_n = \frac{1}{n}\sum_{i=1}^n Y_i^2 \,. \] Using the Law of Large Numbers, determine the value of \(a\in \mathbb{R}\) for which \(\mathbb{P}\left(\lim_{n\to\infty}X_n = a\right) = 1\).

M3. Hand-in
Suppose the random variables \(X_1,X_2, X_3\) and \(X_4\) all have the same expectation \(\mu\). For what value(s) of \(b\in\mathbb{R}\) is \[ M=b(X_1+bX_2)+2X_3-3X_4 \] an unbiased estimator for \(\mu\)?

M4. Recall that we say that \(T_m\) has a \(t\)-distribution with \(m>1\) degrees of freedom, and write \(T_m\sim t(m)\), if it has density function given by \[ f(x) = k_m\left(1+\frac{x^2}{m}\right)^{-\frac{m+1}{2}}\,, \qquad x\in\mathbb{R}\,, \] where \(k_m\) is a constant that ensures that the density integrates to 1. If \(T_m\sim t(m)\), show that \(\mathbb{E}\left[T_m\right] = 0\).
Hint: you shouldn’t need to explicitly calculate any integrals here!

M5. From a dataset \(x_1,…, x_{10}\) it has been calculated that \[ \sum_{i=1}^{10} x_i = 491, \quad \sum_{i=1}^{10} (x_i - \bar{x}_{10})^2 = 41. \] You model the dataset as a random sample from the normal distribution with mean \(\mu\) and variance \(\sigma^2\).

  1. Assume that both \(\mu\) and \(\sigma^2\) are unknown. Determine a 95% confidence interval for the mean \(\mu\). You can use that \(t_{9, 0.025}\approx 2.26\).

  2. Now assume that it is known that the variance is \(\sigma^2 = 5\). Give a 95% confidence interval for the mean \(\mu\) in this case. You can use that \(z_{0.025} \approx 1.96\).

M6. If \(X\) has expectation \(\mu\) and standard deviation \(\sigma\), the ratio \(r=|\mu|/\sigma\) is called the measurement signal-to-noise-ratio of \(X\). If we define \(D=|(X-\mu)/\mu|\) as the relative deviation of \(X\) from its mean \(\mu\), show that, for \(\alpha>0\), \[ \mathbb{P}\left(D< \alpha\right)\geq 1-\frac{1}{r^2\alpha^2}. \]

\[ \mathbb{P}\left(D<\alpha\right)=\mathbb{P}\left(|(X-\mu)/\mu|<\alpha\right)=\mathbb{P}\left(|X-\mu|<\alpha r \sigma\right)\,. \]

We now use Chebychev’s inequality: \[ \mathbb{P}\left(|X-\mu|<k\sigma\right) \geq1-\frac{1}{k^2} \] with \(k=\alpha r\), giving \[ \mathbb{P}\left(|X-\mu|<\alpha r \sigma\right)\geq 1-\frac{1}{r^2\alpha^2}. \]

M7. Let \(X\) be the number of 1s and \(Y\) be the number of 2s that occur in \(n\) rolls of a fair die. Use indicator random variables to compute \(\textup{Cov}\left[X,Y\right]\) and \(\rho(X,Y)\).
Hint: this is just like the smarties example covered in lectures.

M8. In the textbook you should have read the proof of Chebychev’s inequality for the case of a continuous random variable. Provide a similar proof for the case of a discrete random variable.

First we find a lower bound for the variance of \(X\). Let’s write \(\mathbb{E}\left[X\right]=\mu\).

\[ \begin{split} \textup{Var}\left(X\right)&=\mathbb{E}\left[(X-\mu)^2)\right] =\sum_{x\in X(\Omega)}(x-\mu)^2p_X(x) \\ &\geq \sum_{\stackrel{x\in X(\Omega)}{|x-\mu|\geq a}}(x-\mu)^2 p_X(x) \\ &\geq \sum_{\stackrel{x\in X(\Omega)}{|x-\mu|\geq a}}a^2 p_X(x) =a^2P(|X-\mu|\geq a). \end{split} \] Then we divide both sides of the inequality by \(a^2\) to get \[ \mathbb{P}\left(|X-\mathbb{E}\left[X\right]|\geq a\right)\leq \frac{1}{a^2}\textup{Var}\left(X\right). \]

M9. Assume that in Example 17.1 from the lectures (measuring a ball rolling down an inclined plane), we choose to stop the ball always after one time unit, so that \[ X_i=\frac12 a\,(1+U_i)^2+V_i, \] where the independent errors are normally distributed with \(U_i\sim\mbox{\textup{N}}(0,\sigma_U^2),\,V_i\sim\mbox{\textup{N}}(0,\sigma_V^2)\). Assume the variances of the errors are known. Calculate the bias of the estimator \(A=2\bar{X}_n\) for the acceleration parameter \(a\). Propose an unbiased estimator for \(a\).

Desserts

Still hungry for more? Try these if you want to push yourself further. (These are mostly harder than I’d expect you to answer in an exam, or involve non-examinable material.)

D1. Consider the following dataset of lifetimes of ball bearings in hours: \[ \begin{split} &6278~~ 3113~~ 5236~~ 11584~~ 12628~~ 7725~~ 8604~~ 14266~~ 6125~~ 9350\\ &3212~~ 9003~~ 3523~~ 12888~~ 9460~~ 13431~~ 17809~~ 2812 ~~ 11825 ~~ 2398. \end{split} \] Suppose that we are interested in estimating the minimum lifetime of this type of ball bearing. The dataset is modelled as a realization of a random sample \(X_1 , \dots , X_n\). Each random variable \(X_i\) is represented as \(X_i=\delta +Y_i\), where \(Y_i\) has an \(\mbox{\textup{Exp}}(\lambda)\) distribution and \(\delta > 0\) is an unknown parameter that is supposed to model the minimum lifetime. The objective is to construct an unbiased estimator for \(\delta\). It is known that \[ \mathbb{E}\left[M_n\right]=\delta+\frac{1}{n\lambda}~\text{ and }~\mathbb{E}\left[\bar{X}_n\right]=\delta+\frac{1}{\lambda}, \] where \(M_n = \min(X_1 , \dots , X_n)\) and \(\bar{X}_n=(X_1 + \cdots + X_n)/n\).

  1. Check whether \[ T=\frac{n}{n-1}\left(\bar{X}_n-M_n\right) \] is an unbiased estimator for \(1/\lambda\).

  2. Construct an unbiased estimator \(D\) for \(\delta\).

  3. Use the dataset to compute an estimate for the minimum lifetime \(\delta\).

D2. Let \(M_n\) be the maximum of \(n\) independent \(\mbox{\textup{Uniform}}(0,1)\) random variables. Show that for any fixed \(\varepsilon>0\), \[ \lim_{n\to\infty} \mathbb{P}\left(|M_n-1|> \varepsilon\right) = 0 \,. \]

D3. (A more general law of large numbers, see Exercise 13.12 in the textbook).
Let \(X_1,X_2,\dots\) be a sequence of independent random variables with \(\mathbb{E}\left[X_i\right]=\mu_i\) and \(\textup{Var}\left(X_i\right)=\sigma_i^2\) for \(i=1,2,\dots\). Let \(\bar{X}_n=(X_1+\cdots +X_n)/n\). Suppose that there exists an \(M\in\mathbb{R}\) such that \(0<\sigma_i^2\leq M\) for all \(i\), and let \(a\) be an arbitrary positive number.

  1. Apply Chebychev’s inequality to show that \[ \mathbb{P}\left(\left|\bar{X}_n-\frac{1}{n}\sum_{i=1}^n\mu_i\right|>a\right)\leq\frac{\textup{Var}\left(X_1\right)+\cdots +\textup{Var}\left(X_n\right)}{n^2a^2}. \]

  2. Conclude from a) that \[ \lim_{n\to\infty}\mathbb{P}\left(\left|\bar{X}_n-\frac{1}{n}\sum_{i=1}^n\mu_i\right|>a\right)=0. \]

  3. Check that the weak law of large numbers is a special case of this result.

Challenge question

Suppose that \(X_1, \dots, X_n\) are mutually independent random variables, each distributed as \(\mbox{\textup{Exp}}(\lambda)\). (That is, all events of the kind \(\{X_1 \leq x_1\},\ldots,\{X_n\leq x_n\}\) are mutually independent.) Let \(Y_n = \min\{X_1,\dots, X_n\}\), and \(V_n = \max\{X_1,\dots, X_n\}\).

  1. Show that \(Y_n\sim\mbox{\textup{Exp}}(\lambda n)\).
  2. What is the distribution function of \(V_n\)?
  3. Show that, for all \(s>0\), \[ \lim_{n\to\infty} \mathbb{P}\left(V_n - (\log n)/\lambda \le s\right) = \exp(-e^{-\lambda s}) \,. \]