AP Stats Home
Sampling Distributions for Sample Means
Introduction
A Sampling Distributions for Sample Means describes the distribution of all possible sample means ( $ \bar{x} $ ) that could be obtained from samples of the same size ( $ n $ ) drawn from the same population. It’s a crucial concept for Inference and Experiments as it allows us to make predictions about a population parameter based on a sample statistic.
Mean of the Sampling Distribution ( $ \mu_{\bar{x}} $ )
The mean of the sampling distribution of the sample mean, denoted as $ \mu_{\bar{x}} $ , is equal to the population mean, $ \mu $ . This is a property that makes the sample mean an Biased and Unbiased Point Estimates|unbiased estimator of the population mean.
$$ \mu_{\bar{x}} = \mu $$
This means that, on average, the sample means will target the true population mean.
Standard Deviation of the Sampling Distribution ( $ \sigma_{\bar{x}} $ )
The standard deviation of the sampling distribution of the sample mean, often called the standard error of the mean, is denoted as $ \sigma_{\bar{x}} $ . It measures the typical variability of sample means around the population mean.
$$ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} $$
Where:
- $ \sigma $ is the population standard deviation.
- $ n $ is the sample size.
Important Note on Conditions: For this formula to be valid, the Random Sampling and a Collection|10% Condition must be met: the sample size $ n $ must be no more than 10% of the population size $ N $ . If $ n > 0.10N $ , the formula needs to be adjusted using a finite population correction factor, though this is rarely required in introductory AP Statistics problems. If the population standard deviation $ \sigma $ is unknown, we use the sample standard deviation $ s $ to estimate it, leading to the use of a t-distribution, which is covered in Constructing a Confidence Interval for a Population Mean.
Shape of the Sampling Distribution
The shape of the sampling distribution of $ \bar{x} $ depends on two main factors:
-
Population Distribution:
- If the population distribution is Normal: The sampling distribution of $ \bar{x} $ will also be approximately Normal, regardless of the sample size $ n $ .
- If the population distribution is not Normal: The shape of the sampling distribution of $ \bar{x} $ becomes approximately Normal as the sample size $ n $ increases. This is due to the The Central Limit Theorem.
-
The Central Limit Theorem (CLT): The CLT states that if the sample size $ n $ is sufficiently large (generally $ n \ge 30 $ ), the sampling distribution of $ \bar{x} $ will be approximately Normal, regardless of the shape of the original population distribution. This is a powerful result that allows us to use The Normal Distribution, Revisited to make inferences even when the population distribution is unknown or non-normal.
Conditions for Inference with Sample Means
When performing inference about a population mean using a sample mean, we need to check several conditions to ensure the validity of our procedures. These are often summarized as “N-I-N-C” (Normal, Independent, Random, 10% Condition):
| Condition | Description Random Sample/Random Sample: The sample was obtained through a random selection process. This is crucial for valid inference.
- Independence: Observations within the sample must be independent of each other. If the sample is selected without replacement, the sample size $ n $ must be less than 10% of the population size $ N $ (the Potential Problems with Sampling|10% condition mentioned earlier). This ensures that the probability of selecting an item doesn’t significantly change as items are selected, maintaining approximate independence.
- Normality of the Sampling Distribution: The sampling distribution of $ \bar{x} $ must be approximately Normal. This can be satisfied if:
- The population itself is Normally distributed.
- The sample size $ n $ is sufficiently large ( $ n \ge 30 $ ) due to the The Central Limit Theorem.
- If $ n < 30 $ and the population is not Normal, we must visually inspect a Representing a Quantitative Variable with Graphs|graph of the sample data (e.g., histogram, boxplot) for strong skewness or outliers. If the data is roughly symmetric and unimodal with no outliers, we can proceed with caution.
Failing to meet these conditions can invalidate the results of statistical inference.