The relationship between the concentration of cholesterol in the blood and the occurrence of heart disease has been the subject of much research. As part of a government health survey, researchers measured serum cholesterol levels for a large sample of Americans, including children. The distribution for children between $12$ and $14$ years of age can be fairly well approximated by a normal curve with mean $\mu=155$ mg/dl and standard deviation $\sigma=27$ mg/dl. The following figure shows a histogram based on a sample of $431$ children between $12$ and $14$ years old, with the normal curve superimposed.
If the variable $Y$ follows a normal distribution, then
By taking advantage of the standardized scale, we can use $Z$ table to answer detailed questions about any normal population when the population mean and standard deviation are specified.
A professor's exam scores are approximately distributed normally with mean $80$ and standard deviation $5$.
We often need to determine corresponding $z$-values when we want to determine a percentile of a normal distribution. For example, suppose we want to find the $70$th percentile of a standard normal distribution. We need to look in Z table for an area of $0.7000$. The closest value is an area of $0.6985$, corresponding to a $z$ value of $0.52$.
Many statistical procedures are based on having data from a normal population. In this section we consider ways to assess whether it is reasonable to use a normal curve model for a set of data and, if not, how we might proceed.
A normal quantile plot is a special statistical graph that is used to assess normality. We present this statistical tool with an example using the heights (in inches) of a sample of $11$ women, sorted from smallest to largest:
$$61, 62.5, 63, 64, 64.5, 65, 66.5, 67, 68, 68.5, 70.5$$Based on these data, does it make sense to use a normal curve to model the distribution of women's heights?
library(ggplot2)
# Create the normal quantile plot using ggplot2
g <- ggplot(data.frame(y = c(61, 62.5, 63, 64,
64.5, 65, 66.5, 67,
68, 68.5, 70.5)),
aes(sample = y)) +
stat_qq() +
stat_qq_line() +
labs(x = "Theoretical Quantiles", y = "Sample Quantiles") +
ggtitle("Normal Quantile Plot") +
theme_bw() +
theme(text = element_text(size = 20))
options(repr.plot.width=6, repr.plot.height=5)
g