Correlation

Carson West

AP Stats Home

Correlation

Correlation is a statistical measure that quantifies the strength and direction of the linear relationship between two Quantitative Variable. It’s a key component in understanding how two variables move together. Before delving into correlation, it’s essential to visualize the relationship between two quantitative variables using a Representing the Relationship Between Two Quantitative Variables|scatterplot.

The Correlation Coefficient ( $ r $ )

The most common measure of correlation is the Pearson product-moment correlation coefficient, denoted by $ r $ .

Formula for $ r $

The correlation coefficient $ r $ for a set of $ n $ paired observations $ (x_i, y_i) $ is calculated as:

$$ r = \frac{1}{n-1} \sum_{i=1}^{n} \left(\frac{x_i - \bar{x}}{s_x}\right) \left(\frac{y_i - \bar{y}}{s_y}\right) $$
Where:

Properties of $ r $

Understanding the properties of the correlation coefficient is crucial for correct interpretation:

Interpreting $ r $ Values

Here’s a general guide for interpreting the strength of the linear relationship based on $ r $ :

Value of $ r $ (absolute) Strength of Linear Relationship
$ 0.90 \le r
$ 0.70 \le r
$ 0.50 \le r
$ 0.30 \le r
$ 0.00 \le r

Correlation Does Not Imply Causation

One of the most critical concepts in statistics, often misunderstood, is that correlation does not imply causation. Just because two variables are strongly correlated does not mean that one causes the other. There might be:

For example, the number of ice cream sales and the number of drownings might be strongly positively correlated. This doesn’t mean eating ice cream causes drowning, nor that drowning causes ice cream sales. Both are influenced by a lurking variable: hot weather.

Always be cautious when drawing conclusions about cause-and-effect from correlational studies. Experimental studies are generally needed to establish causation, as discussed in Introduction to Experimental Design.

When to Use Correlation

Correlation is appropriate for describing the relationship between two Quantitative Variable. It’s most meaningful when the scatterplot shows a roughly linear pattern. If the pattern is clearly curved, correlation is not an appropriate measure of the relationship, and other methods like Analyzing Departures from Linearity may be needed.