In this chapter we study analysis of variance (ANOVA). In Chapter 7 we considered the comparison of two independent samples with respect to a quantitative variable $Y$. The classical techniques for comparing the two sample means $\bar{Y}_1$ and $\bar{Y}_2$ are the test and the confidence interval based on Student's $t$ distribution. In this chapter we consider the comparison of the means of $I$ independent samples, where $I$ may be greater than $2$.
When growing sweet corn, can organic methods be used successfully to control harmful insects and limit their effect on the corn? In a study of this question researchers compared the weights of ears of corn under five conditions in an experiment in which sweet corn was grown using organic methods. The treatments were
Ears of corn were randomly sampled from each plot and weighed. The results are given in the table and figure below.
The classical method of analyzing data from $I$ independent samples is called an analysis of variance, or ANOVA. In applying analysis of variance, the data are regarded as random samples from $I$ populations. We denote the means of these populations as $\mu_1, \mu_2, \ldots, \mu_I$ and the standard deviations as $\sigma_1, \sigma_2, \ldots, \sigma_I$. We test a null hypothesis of equality among all $I$ population means, $$H_0:\mu_1=\mu_2=\cdots=\mu_I.$$
The table below displays the overall risk of Type I error. It is clear that the researcher who uses repeated $t$ tests is highly vulnerable to Type I error unless $I$ is quite small. The difficulties illustrated by the table below are due to multiple comparisons. That is, many comparisons on the same set of data. These difficulties can be reduced when the comparison of several groups is approached through ANOVA.
The ANOVA model presented previously that compares the means of three or more groups is called a one-way ANOVA. The term "one-way" refers to the fact that there is one variable that defines the groups or treatments (e.g., in the sweet corn example the treatments were based on the type of harmful insect/bacteria).
The following table shows the weight gains (in 2 weeks) of young lambs on three different diets. (These data are fictitious, but are realistic in all respects except for the fact that the group means are whole numbers.)
The total number of observations is $n=\sum_{i=1}^3n_i=3+5+4=12$ and the grand mean is $$\bar{Y}=\frac{\sum_{i=1}^3n_i\bar{Y}_i}{n}=\frac{3\times11+5\times15+4\times12}{12}=\frac{156}{12}=13\text{ lb}.$$ The pooled variance and standard deviation are calculated as $$s^2=\frac{(3-1)\times4.359^2+(5-1)\times4.950^2+(4-1)\times4.967^2}{12-3}=\frac{210}{9}=23.33,$$ and $$s=\sqrt{23.33}=4.83\text{ lb.}$$
For the data in weight gain of lambs example, one has $$\mathrm{MSW}=s^2=23.33, \mathrm{SSW}=210,$$ and $$\mathrm{SSB}=3\times(11-13)^2+5\times(15-13)^2+4\times(12-13)^2=36,$$ $$\mathrm{MSB}=\frac{\mathrm{SSB}}{I-1}=18.$$
The name analysis of variance derives from a fundamental relationship involving SSB and SSW. Consider an individual observation $Y_{ij}$. It is obviously true that $$Y_{ij}-\bar{Y}=(Y_{ij}-\bar{Y}_i)+(\bar{Y}_i-\bar{Y}).$$ This equation expresses the deviation of an observation from the grand mean as the sum of two parts: a within-group deviation $(Y_{ij}-\bar{Y}_i)$ and a between-group deviation $(\bar{Y}_i-\bar{Y})$. It is also true (but not at all obvious) that the analogous relationship holds for the corresponding sums of squares; that is $$\sum_{i=1}^I\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y})^2=\sum_{i=1}^I\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_i)^2+\sum_{i=1}^I\sum_{j=1}^{n_i}(\bar{Y}_i-\bar{Y})^2,$$ which, by rewriting each of the sums on the right-hand side, can be expressed as $$\sum_{i=1}^I\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y})^2=\sum_{i=1}^I(n_i-1)s_i^2+\sum_{i=1}^In_i(\bar{Y}_i-\bar{Y})^2=\mathrm{SSW}+\mathrm{SSB}.$$
The quantity on the left-hand side is called the total sum of squares, or SSTO: $$\mathrm{SSTO}=\sum_{i=1}^I\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y})^2.$$ Note that SSTO measures variability among all $n$ observations in the $I$ groups. It follows that $$\mathrm{SSTO}=\mathrm{SSW}+\mathrm{SSB}.$$ The preceding fundamental relationship shows how the total variation in the data set can be analyzed, or broken down, into two interpretable components: between-sample variation and within-sample variation.
Note that the corresponding degrees of freedom have the same relationship; that is $$n-1=(n-I)+(I-1),$$ where the left-hand side is called the total degrees of freedom.
For the data in weight gain of lambs example, we found $\bar{Y}=13$ lb; we calculate SSTO as \begin{align*} \mathrm{SSTO}=&\sum_{i=1}^I\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y})^2\\=&{\left[(8-13)^2+(16-13)^2+(9-13)^2\right] } \\ & +\left[(9-13)^2+(16-13)^2+(21-13)^2+(11-13)^2+(18-13)^2\right] \\ & +\left[(15-13)^2+(10-13)^2+(17-13)^2+(6-13)^2\right] \\= & 246. \end{align*} For these data, we found that $\mathrm{SSW}=210$ and $\mathrm{SSB}=36$. We verify that $$246=210+36.$$ Also, we found that the degrees of freedom within groups $=9$ and the degrees of freedom between groups $=2$. We verify that $$11=9+2.$$
When working with the ANOVA quantities, it is customary to arrange them in a table. The table below shows the ANOVA for the lamb weight-gain data. Notice that the ANOVA table clearly shows the additivity of the sums of squares and the degrees of freedom.
We think of $Y_{ij}$ as a random observation from group $i$, where the population mean of group $i$ is $\mu_i$. It can be helpful to think of ANOVA in terms of the following model: $$Y_{ij}=\mu+\tau_i+\varepsilon_{ij},$$ where
Thus the preceding model can be stated in words as $$\text{observation }=\text{ overall average }+\text{ group effect }+\text{ random error}.$$
The group effect $\tau_i$ can be regarded as the difference between the population mean for group $i$, $\mu_i$, and the grand population mean, $\mu$. Thus, $$\tau_i=\mu_i-\mu$$ and the preceding model becomes $$Y_{ij}=\mu_i+\varepsilon_{ij}.$$ The null hypothesis $$H_0:\mu_1=\mu_2=\cdots=\mu_I$$ is equivalent to $$H_0:\tau_1=\tau_2=\cdots=\tau_I=0.$$ If $H_0$ is false, then at least some of the groups differ from the others. If $\tau_i$ is positive, then observations from group $i$ tend to be greater than the overall average; if $\tau_i$ is negative, then data from group $i$ tend to be less than the overall average.
The population parameters $\mu, \mu_i, \tau_i,$ and $\varepsilon_{ij}$ can be estimated by the corresponding sample quantities. $$\hat{\mu}=\bar{Y}, \hat{\mu}_i=\bar{Y}_i, \hat{\tau}_i=\bar{Y}_i-\bar{Y}, \hat{\varepsilon}_{ij}=Y_{ij}-\bar{Y}_i.$$ Putting theses estimates together, we have $$Y_{ij}=\hat{\mu}+\hat{\tau}_i+\hat{\varepsilon}_{ij}=\bar{Y}+(\bar{Y}_i-\bar{Y})+(Y_{ij}-\bar{Y}_i).$$ While the terms "between-groups" and "within-groups" are not technical terms, they are useful in describing and understanding the ANOVA model. Computer software and other texts commonly refer to these sources of variability as treatment (between groups) and error (within groups).
For the data in weight gain of lamb example, the estimate of the grand population mean is $\hat{\mu}=13$. The estimated group effects are $$\hat{\tau}_i=\bar{Y}_i-\bar{Y}=11-13=-2, \hat{\tau}_2=15-13=2, \hat{\tau}_3=12-13=-1.$$ Thus, we estimate that Diet 2 increases weight gain by 2 lb on average (when compared to the average of the three diets), Diet 1 decreases weight gain by an average of 2 lb, and Diet 3 decreases weight gain by 1 lb, on average.
When we conduct an analysis of variance, we are comparing the sizes of the sample group effects, the $\hat{\tau}_i$'s, to the sizes of the random errors in the data, the $\hat{\varepsilon}_{ij}$'s. We can see that $$\mathrm{SSB}=\sum_{i=1}^In_i\hat{\tau}_i^2,\quad\mathrm{SSW}=\sum_{i=1}^I\sum_{j=1}^{n_i}\hat{\varepsilon}_{ij}^2.$$
The global null hypothesis is $$H_0:\mu_1=\mu_2=\cdots=\mu_I$$ against the alternative hypothesis $$H_A:\text{ The }\mu_i\text{'s are not all equal}.$$ Note that $H_0$ is compound (unless $I=2$), and so rejection of $H_0$ does not specify which $\mu_i$'s are different. If we reject $H_0$, then we conduct a further analysis to make detailed comparisons among the $\mu_i$'s.
The form of an $F$ distribution depends on two parameters: the numerator degrees of freedom and the denominator degrees of freedom. Critical values for the $F$ distribution are given in $F$ Table. Note that $F$ Table occupies 10 pages, each page having a different value of the numerator df. As a specific example, for numerator $\mathrm{df}=4$ and denominator $\mathrm{df}=20$, we find in $F$ Table that $F_{4, 20}(0.05)=2.87$; this value is shown in the figure below.
The $F$ test is a classical test of the preceding global null hypothesis. The test statistic, the $F$ statistic, is calculated as follows: $$T=\frac{\mathrm{MSB}}{\mathrm{MSW}}.$$ From the definitions of the mean squares, it is clear that $T$ will be large if the discrepancies among the group means ($\bar{Y}_i$'s) are large relative to the variability within the groups. Thus, large values of $T$ tend to provide evidence against $H_0$ (evidence for a difference among the group means).
It can be shown mathematically that the null distribution of the test statistic $T$ is the $F$ distribution with the numerator df being the df between groups and the denominator df being the df within groups. Specifically, $$T\overset{H_0}{\sim}F_{I-1, n-I}.$$ Therefore, $H_0$ is rejected at the $\alpha$ level of significance if $$p\text{-value }=P(F_{I-1, n-I}>T)<\alpha\mbox{ or }T>F_{I-1, n-I}(\alpha).$$
For the data in weight gain of lamb example, the global null hypothesis and alternative can be stated verbally as $${\scriptstyle H_0:\text{ Mean weight gain is the same on all three diets. v.s. }H_A:\text{ Mean weight gain is not the same on all three diets.}}$$ or symbolically as $$H_0: \mu_1=\mu_2=\mu_3\text{ v.s. }H_A:\text{ The }\mu_i\text{'s are not all equal.}$$ From the ANOVA table we find $$T=\frac{18}{23.33}=0.77.$$ The degrees of freedom can also be read from the ANOVA table as numerator df $=2$ and denominator df $=9$. From $F$ Table we find $F_{2, 9}(0.20)=1.93$. So $p$-value $>0.20$ (Computer software gives $p$-value $=0.4907$). Thus, there is a lack of significant evidence against $H_0$; there is insufficient evidence to conclude that there is any difference among the diets with respect to population mean weight gain.
In many studies, interesting questions can be addressed by considering linear combinations of the group means. A linear combination $L$ is a quantity of the form $$L=\sum_{i=1}^Im_i\bar{Y}_i,$$ where the $m_i$'s are the multipliers of the $\bar{Y}_i$'s.
Each linear combination $L$ is an estimate, based on the $\bar{Y}_i$'s, of the corresponding linear combination of the population means ($\mu_i$'s). As a basis for statistical inference, we need to consider the standard error of a linear combination, which is calculated as follows.
The standard error of the linear combination $$L=\sum_{i=1}^Im_i\bar{Y}_i$$ is $$\mathrm{SE}_{L}=\sqrt{\mathrm{MSW}\times\sum_{i=1}^I\frac{m_i^2}{n_i}}.$$
Linear combinations of means can be used for testing hypotheses and for constructing confidence intervals. Critical values are obtained from Student's $t$ distribution with df being the degrees of freedom within group, i.e., $n-I$. Confidence intervals are constructed using the familiar Student's $t$ format.
In general, a $1-\alpha$ confidence interval for the linear combination $L$ is $$L\pm t_{n-I}(\alpha/2)\times\mathrm{SE}_{L}.$$
After finding significant evidence for a difference among population means $\mu_1, \mu_2, \ldots, \mu_I$ using a global $F$ test, we wish to conduct pairwise comparisons between different population means to further detect where the difference lies in. However, repeated $t$ tests can lead to an increased overall risk of Type I error. The Bonferroni's method is one popular method to control the overall risk of Type I error.
The Bonferroni's method is based on a very simple and general relationship: The probability that at least one of several events will occur cannot exceed the sum of the individual probabilities. For instance, suppose we conduct five tests of hypotheses, each at $\alpha_i=0.01$. Then the overall risk of Type I error $\alpha$ (the chance of rejecting at least one of the six hypotheses when in fact all of them are true) cannot exceed $5\times0.01=0.05$.
Turning this logic around, suppose an investigator plans to conduct five tests of hypotheses and wants the overall risk of Type I error not to exceed $\alpha=0.05$. A conservative approach is to conduct each of the separate tests at the significance level $\alpha_i=0.05/5=0.01$; this is called a Bonferroni adjustment.
A Bonferroni adjustment can also be made for confidence intervals. For instance, suppose we wish to construct five confidence intervals and desire an overall probability of $95\%$ that all the intervals contain their respective parameters ($\alpha=0.05$). Then this can be accomplished by constructing each interval at confidence level $99\%$ (because $0.05/5=0.01$ and $1-0.01=0.99$).
In general, to construct $k$ Bonferonni-adjusted confidence intervals with an overall probability of $100(1-\alpha)\%$ that all the intervals contain their respective parameters, we construct each interval at confidence level $100(1-\alpha/k)\%$. Formally, the Bonferonni-adjusted $1-\alpha$ confidence interval for $\mu_a-\mu_b$ is $$(\bar{Y}_a-\bar{Y}_b)\pm t_{n-I}(\alpha/(2k))\times\mathrm{SE}_{\bar{Y}_a-\bar{Y}_b},$$ where the standard error $$\mathrm{SE}_{\bar{Y}_a-\bar{Y}_b}=\sqrt{\mathrm{MSW}\times\left(\frac{1}{n_a}+\frac{1}{n_b}\right)}.$$
Note that the application of Bonferroni's method requires unusual critical values, so standard tables are not sufficient. Bonferroni Table provides Bonferroni multipliers for confidence intervals that are based on a $t$ distribution.
In a study to investigate the effect of oyster density on seagrass biomass, researchers introduced oysters to thirty 1-m$^2$ plots of healthy seagrass. At the beginning of the study the seagrass was clipped short in all plots. Next, 10 randomly chosen plots received a high density of oysters; 10, an intermediate density; and 10, a low density. As a control, an additional 10 randomly chosen clipped 1-m$^2$ plots received no oysters. After 2 weeks, the belowground seagrass biomass was measured in each plot (g/m$^2$ ). Data from some plots are missing. A summary of the data as well as the ANOVA table follow.
The $p$-value for the global $F$ test is $0.0243$, indicating that there is significant evidence of a difference among the biomass means under these experimental conditions. We thus proceed with pairwise comparisons to further detect which two conditions are different. To control the overall risk of Type I error, we calculate the Bonferroni-adjusted $95\%$ confidence intervals for the total of six comparisons. Each individual confidence interval shall be constructed at confidence level $99.17\%$ since $0.05/6=0.0083$ and $1-0.0083=0.9917$.
The following table summarizes the Bonferroni-adjusted confidence intervals for the total six pairwise comparisons.
The ANOVA techniques described in this chapter, including the global F test, are valid if the following conditions hold.