AP Stats Home
Constructing a Confidence Interval for a Population Mean
A confidence interval provides a plausible range of values for an unknown population parameter, in this case, the population mean ( $ \mu $ ), based on sample data. It consists of a point estimate plus or minus a margin of error.
General Formula
The general form for a confidence interval for a population mean is:
$$ \text{Point Estimate} \pm \text{Margin of Error} $$
For a population mean, the point estimate is the sample mean ( $ \bar{x} $ ). The margin of error accounts for the variability of the sample statistic and the desired confidence level.
Conditions for Inference
Before constructing a confidence interval, certain conditions must be met to ensure the validity of the procedure. These are often referred to as the “3 Random Conditions”:
- Random: The data must come from a Random Sampling and a Collection|random sample or a Introduction to Experimental Design|randomized experiment. This ensures the sample is representative of the population.
- Independent (10% Condition): When sampling without replacement, the sample size ( $ n $ ) should be no more than 10% of the population size ( $ N $ ). That is, $ n \le 0.10N $ . This ensures that the observations are approximately independent.
- Normal/Large Sample (Large Counts Condition): The sampling distribution of the sample mean must be approximately normal. This can be satisfied in a few ways:
- The population distribution is approximately normal.
- The sample size is large ( $ n \ge 30 $ ), due to the The Central Limit Theorem|Central Limit Theorem.
- If the sample size is small ( $ n < 30 $ ) and the population distribution is not known to be normal, we must graphically check for strong skewness or outliers in the sample data (e.g., using a histogram or boxplot).
Standard Error and the t-Distribution
When we construct a confidence interval for a population mean, the population standard deviation ( $ \sigma $ ) is almost always unknown. Therefore, we must estimate it using the sample standard deviation ( $ s_x $ ). This leads to using the t-Distribution instead of the Normal distribution for the critical value.
The standard error of the sample mean is given by:
$$ SE_{\bar{x}} = \frac{s_x}{\sqrt{n}} $$
The t-distribution is a family of distributions that are bell-shaped and symmetric, but have “fatter” tails than the Normal distribution, reflecting the additional uncertainty introduced by estimating $ \sigma $ with $ s_x $ . The specific shape of the t-distribution depends on its Degrees of Freedom.
Degrees of Freedom (df)
The degrees of freedom for a t-distribution when estimating a single population mean is $ df = n - 1 $ , where $ n $ is the sample size. As the degrees of freedom increase, the t-distribution approaches the standard Normal distribution.
Formula for a t-Interval for $ \mu $
The formula for a confidence interval for a population mean is:
$$ \bar{x} \pm t^* \left( \frac{s_x}{\sqrt{n}} \right) $$
Where:
- $ \bar{x} $ = sample mean
- $ t^* $ = critical t-value determined by the chosen Confidence Level and the degrees of freedom ( $ n-1 $ ). This value is found using a t-table or statistical software.
- $ s_x $ = sample standard deviation
- $ n $ = sample size
Steps for Constructing a Confidence Interval
Here’s a structured approach:
| Step | Description is an important tool in determining the likelihood of certain events occurring when considering population data. It helps in deciding if the population mean should be used, or if the distribution should be analyzed using other measures such as the median.
Confidence Level
The confidence level (C) expresses the probability that the confidence interval contains the true population parameter. It is typically expressed as a percentage (e.g., 90%, 95%, 99%).
For example, a 95% confidence interval means that if we were to take many random samples and construct a confidence interval from each, about 95% of these intervals would contain the true population mean.
Interpreting a Confidence Interval
Once calculated, the confidence interval must be interpreted correctly. The interpretation should be stated in the context of the problem.
- Correct Interpretation: “We are C% confident that the true population mean ( $ \mu $ ) lies between [lower bound] and [upper bound].”
- Incorrect Interpretation (Common Mistake): “There is a C% probability that the true population mean is within this interval.” The population mean is a fixed value, not a random variable. The interval itself is what varies from sample to sample.
Determining Sample Size for a Confidence Interval
Sometimes, we need to determine the sample size necessary to achieve a desired margin of error with a given confidence level. For a t-interval, this calculation is more complex because $ t^* $ depends on $ n $ . However, if we approximate $ t^* $ with a $ z^* $ (especially for large $ n $ ), we can rearrange the margin of error formula:
$$ ME = z^* \left( \frac{\sigma}{\sqrt{n}} \right) $$
Solving for $ n $ :
$$ n = \left( \frac{z^* \sigma}{ME} \right)^2 $$
This requires an estimate of $ \sigma $ . Often, a pilot study’s standard deviation or a standard deviation from previous research is used. Remember to always round up to the next whole number for the sample size.