AP Stats Home
Setting Up a Test for the Difference of Two Population Means
When we want to compare the means of two distinct populations or two different treatments, we often use a two-sample t-test for the difference between two population means. This statistical inference procedure allows us to determine if there is a statistically significant difference between the true population means ( $ \mu_1 $ and $ \mu_2 $ ). This process involves stating hypotheses, checking conditions, and defining the test statistic and p-value, similar to Setting Up a Test for a Population Mean.
1. State the Hypotheses
The first step in setting up any significance test is to clearly state the null and alternative hypotheses. These hypotheses are statements about the unknown population parameters.
-
Null Hypothesis ( $ H_0 $ ): This hypothesis assumes there is no difference between the two population means. $$ H_0: \mu_1 - \mu_2 = 0 \quad \text{or} \quad H_0: \mu_1 = \mu_2 $$ Where $ \mu_1 $ is the true mean of population 1 and $ \mu_2 $ is the true mean of population 2.
-
Alternative Hypothesis ( $ H_a $ ): This hypothesis suggests that there is a difference between the two population means. The form of the alternative hypothesis depends on the specific research question and whether we are looking for a difference in a particular direction.
Type of Test Alternative Hypothesis ( $ H_a $ ) Two-sided $ H_a: \mu_1 - \mu_2 \neq 0 $ One-sided (left) $ H_a: \mu_1 - \mu_2 < 0 $ One-sided (right) $ H_a: \mu_1 - \mu_2 > 0 $ It is crucial to define what $ \mu_1 $ and $ \mu_2 $ represent in the context of the problem (e.g., " $ \mu_1 $ is the true mean weight loss for diet A" and " $ \mu_2 $ is the true mean weight loss for diet B").
2. Identify the Significance Level ( $ \alpha $ )
Before collecting or analyzing data, you should choose a significance level, denoted by $ \alpha $ . This is the probability of making a Potential Errors When Performing Tests#Type I Error|Type I error (rejecting a true null hypothesis). Common choices for $ \alpha $ are 0.05 or 0.01. If not provided, it is often assumed to be 0.05.
3. Check Conditions for Inference
To ensure the validity of the two-sample t-test, several conditions must be met. These conditions are similar to those for Constructing a Confidence Interval for a Population Mean but are applied to two independent samples.
- Random Condition: The data for each sample must come from a random sample or a randomized experiment.
- If comparing two populations, both samples must be independent random samples from their respective populations.
- If comparing two treatments, individuals must be randomly assigned to the treatment groups.
- Independent Condition (10% Condition): When sampling without replacement, the sample size ( $ n_1 $ and $ n_2 $ ) must be no more than 10% of their respective population sizes ( $ N_1 $ and $ N_2 $ ).
- $ n_1 \le 0.10 N_1 $
- $ n_2 \le 0.10 N_2 $
- This ensures that the observations within each sample are independent. The two samples themselves must also be independent of each other.
- Normal/Large Sample Condition: For each sample, the sampling distribution of the sample mean must be approximately normal. This can be satisfied if:
- Both population distributions are approximately normal (checked with graphs like dot plots, histograms, or normal probability plots).
- Both sample sizes are sufficiently large ( $ n_1 \ge 30 $ and $ n_2 \ge 30 $ ) due to The Central Limit Theorem.
- If sample sizes are small and the population distribution is unknown, use graphical displays of the sample data to assess for strong skewness or outliers. If such features are present, the condition is not met.
4. Name the Test
Based on the parameters being compared (two population means) and the appropriate sampling distribution (t-distribution when population standard deviations are unknown), the test is named a Two-Sample t-Test for a Difference Between Means.
5. Define the Test Statistic
The test statistic for the difference of two population means measures how many standard errors the observed difference in sample means ( $ \bar{x}_1 - \bar{x}_2 $ ) is away from the hypothesized difference (usually 0).
The formula for the two-sample t-test statistic is: $$ t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $$ Where:
- $ \bar{x}_1 $ and $ \bar{x}_2 $ are the sample means.
- $ s_1 $ and $ s_2 $ are the sample standard deviations.
- $ n_1 $ and $ n_2 $ are the sample sizes.
- $ (\mu_1 - \mu_2)_0 $ is the hypothesized difference in population means (typically 0 under $ H_0 $ ).
The degrees of freedom (df) for this t-statistic are calculated using a complex formula or, more commonly in AP Statistics, by using the smaller of $ (n_1 - 1) $ and $ (n_2 - 1) $ for a conservative estimate, or letting technology calculate a more precise value.
After setting up the test, the next step is Carrying Out a Test for the Difference of Two Population Means, which involves calculating the test statistic and the p-value, and then making a conclusion.