Setting Up a Test for the Difference of Two Population Proportions

Carson West

AP Stats Home

Setting Up a Test for the Difference of Two Population Proportions

When we want to compare the proportion of “successes” between two distinct populations or groups, we use a two-sample $ z $ -test for the difference of population proportions. This test helps us determine if an observed difference in sample proportions is statistically significant or if it could reasonably occur by chance, assuming there’s no real difference in the populations.

1. Defining Parameters and Hypotheses

Before we can perform any calculations, we must clearly define the population parameters and state our null and alternative hypotheses.

2. Conditions for Inference

To ensure the validity of our test results, specific conditions must be met. These are crucial for the sampling distribution of the difference in sample proportions to be approximately normal.

Condition Description
Random The data must come from two independent random samples from the populations of interest, or from two groups in a randomized experiment. Random Sampling and a Collection
Independent Observations within each sample must be independent. When sampling without replacement, the population size must be at least 10 times the sample size for both populations. That is, $ N_1 \ge 10n_1 $ and $ N_2 \ge 10n_2 $ . Also, the two samples/groups themselves must be independent of each other.
Normal (Large Counts) The sampling distribution of $ \hat{p}_1 - \hat{p}_2 $ must be approximately normal. This is satisfied if there are at least 10 “successes” and at least 10 “failures” in each sample, using the pooled proportion. $ n_1 \hat{p}_c \ge 10 $ , $ n_1 (1 - \hat{p}_c) \ge 10 $ $ n_2 \hat{p}_c \ge 10 $ , $ n_2 (1 - \hat{p}_c) \ge 10 $

3. Calculating the Pooled Proportion

When we assume the null hypothesis ( $ H_0: p_1 = p_2 $ ) is true, we are assuming that the true population proportions are equal. In this case, it makes sense to “pool” the data from both samples to get a better estimate of this common population proportion. This pooled proportion, denoted $ \hat{p}_c $ (sometimes $ \hat{p} $ ), is used in the standard error calculation for the test statistic.

Let $ x_1 $ and $ x_2 $ be the number of successes in Sample 1 and Sample 2, respectively. Let $ n_1 $ and $ n_2 $ be the sample sizes for Sample 1 and Sample 2, respectively.

The pooled proportion is calculated as: $$ \hat{p}_c = \frac{x_1 + x_2}{n_1 + n_2} $$ where $ x_1 = n_1 \hat{p}_1 $ and $ x_2 = n_2 \hat{p}_2 $ .

4. Test Statistic

The test statistic for the difference of two population proportions is a $ z $ -score, which measures how many standard errors the observed difference in sample proportions ( $ \hat{p}_1 - \hat{p}_2 $ ) is from the hypothesized difference (which is 0 under $ H_0 $ ).

The formula for the $ z $ -test statistic is: $$ z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}_c(1 - \hat{p}_c)\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} $$
Where:

This $ z $ -statistic follows a standard normal distribution if the conditions are met and the null hypothesis is true. Once the test statistic is calculated, it is used to find the Interpreting p-Values|p-value for the test. For the next steps in the process, refer to Carrying Out a Test for the Difference of Two Population Proportions.