In the present chapter we consider the comparison of two samples that are not independent but are paired. In a paired design, the observations $(Y_1, Y_2)$ occur in pairs; the observational units in a pair are linked in some way, so they have more in common with each other than with members of another pair.
The one-sample t-test is a statistical hypothesis test used to determine whether an unknown population mean $\mu$ is different from a specific value. We consider the following two-sided hypothesis testing problem: $$H_0: \mu=c\text{ v.s. }H_A: \mu\neq c.$$ Similar to two-sample $t$ tests in Chapter 7, for normal population we have $$T=\frac{\bar{Y}-c}{\mathrm{SE}_{\bar{Y}}}\sim t_{n-1}.$$ We would reject the null hypothesis $H_0$ at the $\alpha$ level of significance if $|T|>t_{n-1}(\alpha/2)$ or the $p$-value is less than or equal to $\alpha$, where the $p$-value is the area under Student's $t$ curve in the double tails beyond $T$ and $-T$.
According to an official statement released by the state of California, the average height of a redwood tree in California is 350 ft. To assess this claim, a scientist collects a sample of 51 randomly chosen redwood trees in California. She finds that the average heights in her sample is 355 ft and the sample standard deviation is 42 ft.
Since $n=50>30$, the central limit theorem applies and we can conduct a $t$ test. Consider the following hypotheses $$H_0: \mu=350\text{ v.s. }H_A: \mu\neq350.$$ The test statistic is $$T=\frac{\bar{Y}-350}{\mathrm{SE}_{\bar{Y}}},$$ rejecting $H_0$ when $|T|<t_{n-1}(\alpha/2)$. Here $$T=\frac{355-350}{42/\sqrt{51}}=0.8501$$ and $t_{50}(0.1/2)=1.676$. Because $|0.8501|<1.645$, we fail to reject $H_0$ at the $0.01$ level of significance.
The $p$-value for this test (found using a computer) is $2\times P(t_{50}>0.8501)=0.3993$, which is greater than $0.05$, so we do not reject $H_0$. (A quick look at $t$ Table, using $\mathrm{df}=50$, shows that the $p$-value is between $0.20$ and $0.40$.)
In this section we discuss the use of Student's $t$ distribution to obtain tests and confidence intervals for paired data.
During a weight loss study, each of nine subjects was given (1) the active drug m-chlorophenylpiperazine (mCPP) for 2 weeks and then a placebo for another 2 weeks, or (2) the placebo for the first 2 weeks and then mCPP for the second 2 weeks. As part of the study, the subjects were asked to rate how hungry there were at the end of each 2-week period. The hunger rating data are shown in the table below, which also shows the difference in hunger ratings (mCPP - placebo).
In Chapter 7 we considered how to analyze data from two independent samples. When we have paired data, we make a simple shift of viewpoint: Instead of considering $Y_1$ and $Y_2$ separately, we consider the difference $D$, defined as $$D=Y_1-Y_2.$$ Let us denote the mean of sample $D$'s as $\bar{D}$. The quantity $\bar{D}$ is related to the individual sample means as follows: $$\bar{D}=(\bar{Y}_1-\bar{Y}_2).$$ The relationship between population means is analogous: $$\mu_D=\mu_1-\mu_2.$$ Thus, we may say that the mean of the difference is equal to the difference of the means. Because of this simple relationship, a comparison of two paired means can be carried out by concentrating entirely on the $D$'s. The standard error for $D$ is easy to calculate. Because $D$ is just the mean of a single sample, we can apply the SE formula of Chapter 6 to obtain the following formula: $$\mathrm{SE}_{\bar{D}}=\frac{s_D}{\sqrt{n_D}},$$ where $s_D$ is the standard deviation of the $D$'s and $n_D$ is the number of $D$'s.
For the hunger rating data, note that the mean of the difference is equal to the difference of the means: $$\bar{D}=-29.6=55.3-84.9.$$ While the mean of the difference is the same as the difference of the means, note that the SD of the difference is not the difference of the SDs. Thus the standard error of the mean difference must be calculated using the SD of the differences. We calculate the standard error of the mean difference as follows: $$s_D=32.8, n_D=9, \mathrm{SE}_{\bar{D}}=\frac{32.8}{\sqrt{9}}=10.9.$$
For the hunger rating data, we have $n_D=9$ and thus the degrees of freedom for the $t$ distribution is $9-1=8$. From $t$ Table we find that $t_{8}(0.025)=2.306$; thus, the (two-sided) $95\%$ confidence interval for $\mu_D$ is $$-29.6\pm2.306\times\frac{32.8}{\sqrt{9}}$$ or $(-54.8, -4.4)$.
We can also conduct a $t$ test. To test the hypotheses $$H_0: \mu_D=0\text{ v.s. }H_A: \mu_D\neq0,$$ we use the test statistic $$T=\frac{\bar{D}-0}{\mathrm{SE}_{\bar{D}}}.$$ Critical values are obtained from Student's $t$ distribution with degrees of freedom being $n_D-1$.
For the hunger rating data, let us formulate the null hypothesis and alternative hypothesis: $${\scriptstyle H_0:\text{ Mean hunger rating is the same after taking mCPP as it is after taking a placebo.}}$$ $${\scriptstyle H_A:\text{ Mean hunger rating is different after taking mCPP than after taking a placebo.}}$$ Or, in symbols, $$H_0:\mu_D=0\text{ v.s. }H_A:\mu_D\neq0.$$ Let us test $H_0$ against $H_A$ at significance level $\alpha=0.05$. The test statistic is $$T=\frac{-29.6-0}{32.8/\sqrt{9}}=-2.71.$$ From $t$ Table, $t_8(0.02)=2.449$ and $t_8(0.01)=2.896$. We reject $H_0$ and find that there is sufficient evidence ($0.02<p\text{-value}<0.04$) to conclude that mean hunger rating is reduced more by mCPP than by a placebo. (Using a computer gives the $p$-value$=0.027$.)
The preceding conditions are the same as those given in Chapter 6; in the present case, the conditions apply to the $D$'s because the analysis is based on the $D$'s.
Paired designs can arise in a variety of ways, including the following:
In a two-sample test, you compare two independent samples. These samples are expected to be uncorrelated due to their independence. The sample sizes for the two groups do not have to be equal, although it is often preferable to have roughly equal sizes.
In a paired-sample test, you work with a single sample of paired observations. This typically consists of paired measurements on individual subjects, such as "before and after" scores on a questionnaire, exam, or lab test. Alternatively, the pairs can represent pairs of subjects, such as married couples or twins, or pairs of objects produced together. The measurements within pairs are expected to be correlated. In the analysis, you would examine the differences between the paired measurements, denoted as $D_i=Y_{1i}-Y_{2i}$.