### CORRELATION

#### CORRELATION

Correlation coefficients are statistics which can help to describe data sets which contain variables measured at the interval and ratio levels. Correlation coefficients are measures of association between two (or more) variables.

Correlation is a measure of association that tests whether a relationship exists between two variables. It indicates both the strength of the association and its direction (direct or inverse). The Pearson product-moment correlation coefficient, written as r, can describe a linear relationship between two variables.

For example is there a relationship between:
the budget of the police department and the crime rate?
the hours of batting practice and a player's batting average?

The value of r can range from 0.0, indicating no relationship between the two variables, to positive or negative 1.0, indicating a strong linear relationship between the two variables.

 Value of r Indications 0.0 No linear relationship between the two variables +1.0 Strong positive linear relationship; as X increases in value, Y increases in value also; or as X decreases in value, Y decreases also. -1.0 Strong inverse linear relationship; as X increases in value, Y decreases in value; or as X decreases in value, Y increases in value.

#### SCATTERPLOTS

It is useful to obtain a plot of the joint distribution of the values of the two variables, X and Y. These are called scatterplots. The values of X are displayed on the lower, or horizontal axis (called the X-axis) and the values of Y are displayed on the upper or vertical axis (called the Y-axis).

If small values of X are associated with small values for Y, and large values of X are associated with large values of Y, then the data will stretch from the lower left hand corner of the plot to the upper right hand corner of the plot. This indicates a positive relationship.

If small values of X are associated with large values for Y, and large values of X are associated with small values of Y, then the data will stretch from the upper left hand corner of the plot to the lower right hand corner of the plot. This indicates an inverse relationship.

If there is no discernible pattern to the distribution, then the two variables probably are not related in a linear fashion. There may be a strong, non-linear relationship between the two variables (for example, think of the normal curve) but it cannot be detected by r.

When there are only a few data points, it is fairly easy to estimate the strength of the relationship by eyeballing the data. However, with many data points statistics are needed to summarize the strength and direction of the relationship.

The Pearson r assumes that the variables are measured at the interval or ratio level. If the variables are measured at the ordinal level, however (for example, a Likert-type scale), then the Spearman rank correlation can be used. Neither Pearson nor Spearman are designed for use with variables measured at the nominal level; instead, use the point-biserial correlation (for one nominal variable) or phi (for two nominal variables).

The formula for r is as follows: