A probability is a numerical quantity that expresses the likelihood of an event. The probability of an event $E$ is written as $P(E)$. The probability $P(E)$ is always a number between $0$ and $1$, inclusive.
Consider the familiar chance operation of tossing a coin, and define the event $E$: Heads. Each time the coin is tossed, either it falls heads or it does not. If the coin is equally likely to fall heads or tails, then $$P(E)=\frac{1}{2}=0.5.$$ Such an ideal coin is called a "fair" coin. If the coin is not fair (perhaps because it is slightly bent), then $P(E)$ will be some value other than $0.5$, for instance, $P(E)=0.6$.
A large population of the fruitfly Drosophila melanogaster is maintained in a lab. In the population, $30\%$ of the individuals are black because of a mutation, while $70\%$ of the individuals have the normal gray body color. Suppose one fly is chosen at random from the population. Then the probability that a black fly is chosen is $0.3$. More formally, define $E$: Sampled fly is black. Then $P(E)=0.3$.
The preceding example illustrates the basic relationship between probability and random sampling: The probability that a randomly chosen individual has a certain characteristic is equal to the proportion of population members with the characteristic.
Recall the coin tossing example. Suppose that a fair coin is tossed and the number of heads is recorded. One expects that $$\frac{\text{# of heads}}{\text{# of tosses}}\to 0.5.$$ How about the probability of the event that both tosses are heads when this coin is tossed twice?
Suppose a medical test is conducted on someone to try to determine whether or not the person has a particular disease.
In the United States, $44\%$ of the population has type O blood, $42\%$ has type A, $10\%$ has type B, and $4\%$ has type AB. Consider choosing someone at random and determining the person's blood type. The probability of a given blood type will correspond to the population percentage.
The following table shows the relationship between hair color and eye color for a group of 1,770 German men.
Calculate the probabilities of the following events:
The conditional probability of $E_2$, given $E_1$, is $$P(E_2|E_1)=\frac{P(E_1\cap E_2)}{P(E_1)}$$ provided that $P(E_1)>0$.
Consider the hair color and eye color example, what is the probability of the man having blue eyes given that he has black hair?
Consider the hair color and eye color example and two events $E_1$: red hair, $E_2$: brown eyes. Verify rule (8).
The examples presented in the previous section dealt with probabilities for discrete variables. In this section we will consider probability when the variable is continuous
A glucose tolerance test can be useful in diagnosing diabetes. The blood level of glucose is measured one hour after the subject has drunk 50 mg of glucose dissolved in water. The distribution is represented by histograms with class widths equal to (a) 10 and (b) 5, and by (c) a smooth curve.
A smooth curve representing a frequency distribution is called a density curve.
For any two numbers $a$ and $b$, $$\text{Area under density curve between }a\text{ and }b=\text{Proportion of }Y\text{ values between }a\text{ and }b$$
The diameter of a tree trunk is an important variable in forestry. The density curve shown below represents the distribution of diameters (measured at breast height) in a population of 30-year-old Douglas fir trees; areas under the curve are shown, as well. Consider the diameter, in inches, of a randomly chosen tree. Then, for example, $P(4<\text{diameter}<6)=0.33$.
A random variable is simply a variable that takes on numerical values that depend on the outcome of a chance operation.
Consider the chance operation of tossing a die. Let the random variable $Y$ represent the number of spots showing. The possible values of $Y$ are $Y = 1, 2, 3, 4, 5$, or $6$. We do not know the value of $Y$ until we have tossed the die. If the die is perfectly balanced so that each of the six faces is equally likely, then $$P(Y=i)=\frac{1}{6},$$ for $i=1, 2, 3, 4, 5, 6$.
Suppose a family is chosen at random from a certain population, and let the random variable $Y$ denote the number of children in the chosen family. The possible values of $Y$ are $0, 1, 2, 3, \ldots$. The probability that $Y$ has a particular value is equal to the percentage of families with that many children. For instance, if $23\%$ of the families have $2$ children, then $$P(Y=2)=0.23.$$
Let the random variable $Y$ denote the height of a man chosen at random from a certain population. If we know the distribution of heights in the population, then we can specify the probability that $Y$ falls in a certain range. For instance, if $46\%$ of the men are between $65.2$ and $70.4$ inches tall, then $$P(65.2<Y<70.4)=0.46.$$
For the case of a discrete random variable, we can calculate the population mean and standard deviation if we know the probability mass function for the random variable.
Calculate the mean ($3.5$), variance($35/12$), and stand deviation of the discrete random variable defined in the dice example.
If we add two random variables that are not independent of one another, then the variance of the sum depends on the degree of dependence between the variables. To take an extreme case, suppose that one of the random variables is the negative of the other.
Independent-Trials Model: A series of $n$ independent trials is conducted. Each trial results in success or failure. The probability of success is equal to the same quantity, $p$, for each trial, regardless of the outcomes of the other trials.
A binomial random variable is a random variable that satisfies the following four conditions, abbreviated as BInS:
Suppose we have a fair coin and we flip it $10$ times. We are interested in the number of times we get heads (success) out of the $10$ flips.
The binomial distribution comes into play because each flip of the coin is independent, and there are only two possible outcomes (heads or tails) with a fixed probability of success ($0.5$ for a fair coin).
In general, if the random variable $X$ follows the binomial distribution with parameters $n\in\mathbb{N}$ and $p\in[0,1]$, we write $X\sim B(n, p)$. The probability of getting exactly $j$ successes in $n$ independent Bernoulli trials is given by the probability mass function: $$P(X=j)=\binom{n}{j}p^j(1-p)^{n-j}$$ where:
Consider the coin tossing example, let $X$ be the number of heads out of the $10$ flips. It follows that $X\sim B(10, 0.5)$. Using the above formula, the probability of getting exactly $3$ heads out of the $10$ flips is $$P(X=3)=\binom{10}{3}(0.5)^3(1-0.5)^{10-3}=0.117.$$
Sometimes calculating probability of the complement of an event can be easier than calculating the probability of the event itself. We can use the probability of the complement to find the probability of the event by subtracting it from one. This trick is often used when calculating the probability of multiple events.
What is the probability of getting at most 9 heads out of $10$ flips?
For a binomial random variable $X\sim B(n, p)$, we have
The binomial random variable $X\sim B(n, p)$ is the sum of $n$ identical Bernoulli random variables, each with expected value $p$ and variance $p(1-p)$. In other words, if $X_1, \ldots, X_n$ are identical (and independent) Bernoulli random variables with parameter $p$, then $X=X_1+\cdots+X_n$.
In the United States, $85\%$ of the population has Rh positive blood. Suppose we take a random sample of $6$ persons and count the number with Rh positive blood. The binomial model can be applied here, since the BInS conditions are met: There is a binary outcome on each trial (Rh positive or Rh negative blood), the trials are independent (due to the random sampling), $n$ is fixed at $6$, and the same probability of Rh positive blood applies to each person ($p = 0.85$).
Let $Y$ denote the number of persons, out of $6$, with Rh positive blood. Calculate