A meta-study consists of indefinitely many repetitions, or replications, of the same study. If the study consists of drawing a random sample of size $n$ from some population, the corresponding meta-study involves drawing repeated random samples of size $n$ from the same population.
To visualize the sampling distribution of $\bar{Y}$, imagine the meta-study as follows:
When we think of $\bar{Y}$ as a random variable, we need to be aware of two basic facts
A large population of seeds of the princess bean Phaseotus vulgaris is to be sampled. The weights of the seeds in the population follow a normal distribution with mean $\mu=500$ mg and standard deviation $\sigma=120$ mg. Suppose now that a random sample of four seeds is to be weighed, and let $\bar{Y}$ represent the mean weight of the four seeds. What is the sampling distribution of $\bar{Y}$? $N(500, 3600)$
It is important to distinguish clearly among three different distributions related to a quantitative variable $Y$:
Distribution | Mean | Standard deviation |
---|---|---|
$Y$ in population | $\mu$ | $\sigma$ |
$Y$ in sample | $\bar{y}$ | $s$ |
$\bar{Y}$ (in meta-study) | $\mu_{\bar{Y}}=\mu$ | $\sigma_{\bar{Y}}=\sigma/\sqrt{n}$ |
Recall the weights of seeds example, the population mean and standard deviation are $\mu=500$ mg and $\sigma=120$ mg. Suppose we weigh a random sample of $n=25$ seeds from the population and obtain the data in the table below
Notice that the distributions in (a) and (b) are more or less similar; in fact, the distribution in (b) is an estimate of the distribution in (a). By contrast, the distribution in (c) is much narrower, because it represents a distribution of means rather than of individual observations.
We consider a binomial distribution with $n=50$ and $p=0.3$. (a) shows this binomial distribution, using spikes to represent probabilities; superimposed is a normal curve with mean $=np=15$ and standard deviation $=\sqrt{np(1-p)}=3.24$. (b) shows the sampling distribution of $\hat{P}$; superimposed is a normal curve with mean $=p=0.3$ and standard deviation $=\sqrt{p(1-p)/n}=0.0648$.
To illustrate the use of the normal approximation, let us find the probability that $50$ independent trials result in at least $18$ successes, i.e., $P(Y\geq18)$. The exact calculation using the binomial formula is very tedious, which involves $50-18+1=33$ terms ($0.2178$). If instead the normal approximation is adopted, we only need to find the corresponding area under the normal curve.
The Z score that corresponds to $18$ is $$z=\frac{18-15}{3.2404}=0.93.$$ We find that the area is $1-0.8238=0.1762$ using Z table.
The required $n$ depends on the value of $p$.