In this chapter we undertake our first substantial adventure into statistical inference. Recall that statistical inference is based on the random sampling model: We view our data as a random sample from some population, and we use the information in the sample to infer facts about the population.
Statistical estimation is a form of statistical inference. We will learn how to assess the precision of the estimate.
In general, for a sample of observations on a quantitative variable $Y$, the sample mean and SD are estimates of the population mean and SD:
Our goal is to estimate $\mu$. We will see how to assess the reliability or precision of this estimate, and how to plan a study large enough to attain a desired precision.
As part of a larger study of body composition, researchers captured $14$ male Monarch butterflies at Oceano Dunes State Park in California and measured wing area (in cm$^2$). The data are given in the following table
For these data, the mean and standard deviaation are $\bar{y}=32.81$ cm$^2$ and $s=2.48$ cm$^2$. Define the population mean and SD as follows:
It is natural to estimate $\mu$ by the sample mean and $\sigma$ by the sample SD. Specifically,
These estimates are subject to sampling error (not only measurement error). The task of this chapter is to assess the reliability or precision of $\bar{y}$.
The standard deviation of the sampling distribution of $\bar{Y}$ is $$\sigma_{\bar{Y}}=\frac{\sigma}{\sqrt{n}}.$$ The population standard deviation $\sigma$ is typically unknown. Since $s$ is an estimate of $\sigma$, a natural estimate of $\sigma/\sqrt{n}$ would be $$\mathrm{SE}=\mathrm{SE}_{\bar{Y}}=\frac{s}{\sqrt{n}},$$ which is called the standard error of the mean.
For the butterfly wings example, the standard error of the mean is $$\mathrm{SE}_{\bar{Y}}=\frac{s}{\sqrt{n}}=\frac{2.48}{\sqrt{14}}=0.66\text{ cm}^2.$$
The sample SD $s$ describes the dispersion of the data, while the SE $$\frac{s}{\sqrt{n}}$$ describes the unreliability (due to sampling error) in the mean of the sample as an estimate of the mean of the population.
A geneticist weighed $28$ female lambs at birth. The lambs were all born in April, were all the same breed (Rambouillet), and were all single births (no twins). The diet and other environmental conditions were the same for all the parents. The birthweights are shown in the following table.
For these data, the mean is $\bar{y}=5.17$ kg, the sample SD is $s=0.65$ kg, and the SE is $\mathrm{SE}=0.12$ kg. The sample SD, $s$, describes the variability of birthweights among the lambs in the sample, while the SE indicates the variability associated with the sample mean ($5.17$ kg), viewed as an estimate of the population mean birthweight.
This distinction is emphasized in the figure below, which shows a histogram of the lamb birthweight data; the sample SD is indicated as a deviation from the sample mean $\bar{y}$, while the SE is indicated as variability associated with $\bar{y}$ itself.
# generate random sample of various sizes from
# standard normal distribution
res <- matrix(nrow = 4, ncol = 3)
colnames(res) <- c('sample mean', 'sample SD', 'SE')
y <- rnorm(10)
res[1, ] <- round(c(mean(y), sd(y), sd(y)/sqrt(10)), 2)
y <- rnorm(100)
res[2, ] <- round(c(mean(y), sd(y), sd(y)/sqrt(100)), 2)
y <- rnorm(1000)
res[3, ] <- round(c(mean(y), sd(y), sd(y)/sqrt(1000)), 2)
y <- rnorm(10000)
res[4, ] <- round(c(mean(y), sd(y), sd(y)/sqrt(10000)), 2)
res
sample mean | sample SD | SE |
---|---|---|
-0.19 | 1.20 | 0.38 |
-0.04 | 1.09 | 0.11 |
-0.03 | 1.00 | 0.03 |
0.01 | 1.00 | 0.01 |
library(patchwork)
options(repr.plot.width=12, repr.plot.height=8)
(g1 + g2)/(g3 + g4)
The standard error of the mean (the SE) measures how far $\bar{y}$ is likely to be from the population mean $\mu$. In this section we make this idea precise.
By the symmetry of the Student's $t$ distribution and the fact that $$\frac{\bar{Y}-\mu}{s/\sqrt{n}}\sim t_{n-1},$$ one has $$P(-t_{n-1}(0.025)<\frac{\bar{Y}-\mu}{s/\sqrt{n}}<t_{n-1}(0.025))=0.95.$$
Simplifying the following equation $$P(-t_{n-1}(0.025)<\frac{\bar{Y}-\mu}{s/\sqrt{n}}<t_{n-1}(0.025))=0.95$$ leads to $$P(\bar{Y}-t_{n-1}(0.025)\times\frac{s}{\sqrt{n}}<\mu<\bar{Y}+t_{n-1}(0.025)\times\frac{s}{\sqrt{n}})=0.95.$$ The interval $$\bar{Y}\pm t_{n-1}(0.025)\times\frac{s}{\sqrt{n}}$$ is called the (two-sided) $95\%$ confidence interval (CI) for $\mu$.
Generally, the two-sided $1-\alpha$ confidence interval for $\mu$ is constructed using $t_{n-1}(\alpha/2)$ as follows: $$\bar{Y}\pm t_{n-1}(\alpha/2)\times\frac{s}{\sqrt{n}}.$$
For the butterfly data, we have $n=14$, $\bar{Y}=32.8143$ cm$^2$, and $s=2.4757$ cm$^2$. Find a two-sided $95\%$ confidence interval for the population mean $\mu$.
Find a two-sided $90\%$ confidence interval for the population mean $\mu$.
In a certain large population of Blue Jays, the distribution of bill lengths is normal with mean $\mu=25.4$ mm and standard deviation $\sigma=0.08$ mm. Figure below shows some typical samples from this population; plotted on the right are the associated $95\%$ confidence intervals. The sample sizes are $n=5$ and $n=20$.
A confidence level can be interpreted as a probability, but caution is required. If we consider $95\%$ confidence intervals, for instance, then the following statement is correct: $$P(\text{the next sample will give us a confidence interval that contains }\mu)=0.95.$$ However, one should realize that it is the confidence interval that is the random item in this statement, and it is not correct to replace this item with its value from the data. Thus, for instance, we found in the butterfly wings example that the $95\%$ confidence interval for the mean butterfly wings is $$31.4\text{ cm}^2<\mu<34.2\text{ cm}^2.$$ Nevertheless, it is not correct to say that $$P(31.4\text{ cm}^2<\mu<34.2\text{ cm}^2)=0.95.$$ because this statement has no chance element; either $\mu$ is between $31.4$ and $34.2$ or it is not.
In an experiment to assess the effectiveness of hormone replacement therapy, researchers gave conjugated equine estrogen (CEE) to a sample of $94$ women between the ages of $45$ and $64$. After taking the medication for $36$ months, the bone mineral density was measured for each of the $94$ women. The average density was $0.878$ g/cm$^2$, with a standard deviation of $0.126$ g/cm$^2$. Assume the bone mineral density is normally distributed.
Notice that the particular confidence interval does contain $\mu$; this will happen for $95\%$ of samples.
The number of seeds per fruit for the freshwater plant Vallisneria americana varies considerably from one fruit to another. A researcher took a random sample of $12$ fruit and found that the average number of seeds was $320$, with a standard deviation of $125$. The researcher expected the number of seeds to follow, at least approximately, a normal distribution.
It might be that we want a lower bound on $\mu$, the population mean, but we are not concerned with how large $\mu$ might be --> Lower one-sided confidence intervals
The butterfly wing data yielded the following summary statistics: $$\bar{Y}=32.81\text{ cm}^2, s=2.48\text{ cm}^2, \mathrm{SE}=0.66\text{ cm}^2$$ Suppose the researcher is now planning a new study of butterflies and has decided that it would be desirable that the SE be no more than $0.4$ cm$^2$. As a preliminary guess of the SD, she will use the value from the old study, namely $2.48$ cm$^2$. Thus, the desired sample size $n$ must satisfy the following relation: $$\mathrm{SE}=\frac{2.48}{\sqrt{n}}\leq0.04.$$ This quation is easily solved to give $n\geq38.4$. Since one cannot have $38.4$ butterflies, the new study should include at least $39$ butterflies.
The table below shows the actual probability that a Student's $t$ confidence interval will contain $\mu$ for samples from three different populations.
In summary, Student's $t$ method of constructing a confidence interval for m is appropriate if the following conditions hold.
The requirement that the data are a random sample is the most important condition.
Vital capacity is a measure of the amount of air that someone can exhale after taking a deep breath. One might expect that musicians who play brass instruments would have greater vital capacities, on average, than would other persons of the same age, sex, and height. In one study the vital capacities of seven brass players were compared to the vital capacities of five control subjects; the table below shows the data.
For the vital capacity data, preliminary computations yield the results in the following table.
The SE of $\bar{Y}_1-\bar{Y}_2$ is $$\mathrm{SE}_{\bar{Y}_1-\bar{Y}_2}=\sqrt{\frac{0.1892}{7}+\frac{0.1232}{5}}=0.227.$$ Note that $$0.227=\sqrt{0.164^2+0.157^2}$$ and the SE of the difference is greater than either of the individual SEs but less than their sum.
One way to compare two sample means is to construct a confidence interval for the difference in the population means. That is, a confidence interval for the quantity $\mu_1-\mu_2$.
Recall that a $1-\alpha$ confidence interval for the mean $\mu$ of a single population that is normally distributed is constructed as $$\bar{Y}\pm t_{n-1}(\alpha/2)\times\mathrm{SE}_{\bar{Y}}.$$ Analogously, a $1-\alpha$ confidence interval for $\mu_1-\mu_2$ is constructed as $$(\bar{Y}_1-\bar{Y}_2)\pm t_{\nu}(\alpha/2)\times\mathrm{SE}_{\bar{Y}_1-\bar{Y}_2},$$ with the degrees of freedom $$\nu=\frac{(\mathrm{SE}_1^2+\mathrm{SE}_2^2)^2}{\mathrm{SE}_1^4/(n_1-1)+\mathrm{SE}_2^4/(n_2-1)},$$ where $\mathrm{SE}_1=s_1/\sqrt{n_1}$ and $\mathrm{SE}_2=s_2/\sqrt{n_2}$.
The Wisconsin Fast Plant, Brassica campestris, has a very rapid growth cycle that makes it particularly well suited for the study of factors that affect plant growth. In one such study, seven plants were treated with the substance Ancymidol (ancy) and were compared to eight control plants that were given ordinary water. Heights of all of the plants were measured, in cm, after $14$ days of growth. The data are given in the following table. Assume the plant height is normally distributed. Find the $95\%$ confidence interval for $\mu_1-\mu_2$.
The SE for the difference in sample means is $$\mathrm{SE}_{\bar{Y}_1-\bar{Y}_2}=\sqrt{\frac{4.8^2}{8}+\frac{4.7^2}{7}}=2.46.$$ Using the formula, we find the degrees of freedom to be: $$\nu=\frac{(1.7^2+1.8^2)^2}{1.7^4/(8-1)+1.8^4/(7-1)}=12.8.$$
Using a computer, we can find that for a $95\%$ confidence interval the $t$ multiplier for $12.8$ degrees of freedom is $t_{12.8}(0.025)=2.164$. (Without a computer, we could round down the degrees of freedom, in which case the $t$ multiplier is $t_{12}(0.025)=2.179$. This change from $12.8$ to $12$ degrees of freedom has little effect on the final answer.)
The confidence interval formula gives $$(15.9-11.0)\pm2.164\times2.46$$ or $$4.9\pm5.32.$$ The $95\%$ confidence interval for $\mu_1-\mu_2$ is $$(-0.42, 10.22).$$ Thus, we are $95\%$ confident that the population average 14-day height of fast plants when water is used ($\mu_1$) is between 0.42 cm lower and 10.22 cm higher than the average 14-day height of fast plants when ancy is used ($\mu_2$).