|Type of Government||Number of Cities|
Note, however, that a variable may have two modal
categories. For example, if the type of government had looked like this,
|Type of Government||Number of Cities|
Then the variable would have two modal categories,
"Strong Mayor" and "Council/Manager". This means that there is no one central
tendency within the data for this variable. (More will be said about
this under "Skew" below under "Measures of Dispersion.")
For example, if there are 7 categories of employee
pay, the median category will be category number 4, or (7+1)/2=4. In the
example below, the median pay category is $24,000. The value of this category
can also be interpreted as the median pay value.
However, if there are 8 categories of employee pay,
the median pay value will fall in between two categories. The median category
is category 4.5, or (8+1)/2=9/2=4.5 Add the fourth and the fifth
categories and divide by two, or ($24,000+$25,000)/2=$49,000/2=$24,500.
If you have grouped data, that is, ranges of values,
as well as the number of people found in each group or range, there is
a more precise way to calculate the median. For example, assume that the
pay categories are ranges of pay, and employees are distributed among them
|Pay Range||Number of
The median pay is found by using the formula for grouped data, N/2. In this case, there are 60 employees, so the median = 60/2=the 30th observation.
We can see from the cumulative distribution that the 30th observation will be found in the category of $40,000-$49,000. We can estimate the median by calculating the mid-point of this category, by adding the lower boundary value to the higher boundary value and dividing by two, or ($40,000+$49,000)/2=$89,000/2=$44,500.
To calculate the median more exactly, we can look at how many observations into the $40,000-$49,000 category we must go to find the 30th observation.
There are 16 observations in this category. We must go to the 7th observation to find the 30th total observation of the sample. So we can calculate the value of going 7/16th of the way through this category.
The category has 10 values (40,41,42,43,44,45,46,47,48,49). So 7/16 x 10 = 4.375.
We add this to the lower boundary value of the category to get the median salary value of $40,000+$4,375=$44,375.
Note that we must assume that the observations are
evenly distributed within the categories; if the sample size is large enough
in relation to the number of categories, this is usually not a problem.
The average of the following employee salaries is
equal to $21,857.14
However, the average of the following salaries is
equal to $26,375.
This demonstrates the fact that the value of the
mean is sensitive to very high, or very low values. In this case, it may
be better to use the median.
|Pay Range||Number of
In this case, $2,560,000/60=$42,666.67
The following chart shows which measures of variation and dispersion
be used with variables measured at the nominal, ordinal, or interval/ratio
However, if the highest paid employees makes $29,000
per year and the lowest paid employee makes $22,500 per year, the salary
range is $3,000. (Note the average is $26,000)
Although these two organizations have very similar
averages, they have very different ranges. For which organization would
you rather be working?
For example, the 50th percentile is the same as the median; at the 50th percentile, half of the observations have higher values and half of the observations have lower values.
Percentiles are often used on standardized tests, such as the SAT or GRE. If you scored at the 75th percentile, that means that 75% of the other people scored below your score and 25% scored at or above your score.
Sometimes on tests for civil service, applicants are advised that they must score at a certain percentile or above to be considered for an interview, a promotion, etc.
When two organizations have very different ranges
but similar averages, you may want to use the interquartile range, or the
range between the 25th and 75th percentiles. The interquartile range contains
the middle 50% of the observations.
For example, in this case, there are 8 categories,
so one-quarter of 8 categories=2 categories. Ignoring the top two and the
bottom two categories, the interquartile range for this organization is
The interquartile range for this organization is
If most of the observations are near the mean in
value, the standard deviation will be small. But if most of the observations
are far from the mean in value, the standard deviation will be large.
A variable with a large variance has a great deal
of difference in the values of the various observations, while a variable
with a small variance has less difference in the values of the various
If the values of the observations are distributed symmetrically around the mean of a variable, that is called a normal distribution. In this case, the mean, median, and mode will all coincide.
If the values of most of the observations are lower than the value of the mean, then the distribution is called a negatively skewed or left skewed distribution. In this case, the mode will have a lower value than the median, and the mean will have a higher value than the median.
If the values of most of the observations are higher than the value of the mean, then the distribution is called a positively skewed or right skewed distribution. In this case, the mode will have a higher value than the median, and the mean will have a lower value than the median.
An inspection of the skewness of a variable will
help the researcher to decide which of the three measures of central tendency
to use--mean, median, or mode--as the best indicator of the central tendency
of the distribution of values for that variable.
1) it has a bell-shaped, symmetrical curve
2) the mean, median, and mode all have the same value
3) the properties of the curve are known
4) it is useful in calculating estimates in inferential statistics
If the distribution of the values on a variable approach a normal curve, we know that approximately 68% of the values will be within plus or minus one standard deviation from the mean; 95% of the values will be within plus or minus two standard deviations from the mean; and 99% of the values will be within plus or minus three standard deviations from the mean.
This is useful because the value of any one observation
can be converted to a standardized score, or z-score. A standardized score
or z-score converts any observation to a measure of standard deviation
units, where the value of the mean equals zero and the value of a standard
deviation equals one.
A z-score of +1.5 means that the value of the observation lies 1.5 standard deviation units above the mean. A z-score of -2.0 means that the value of the observation lies 2.0 standard deviation units below the mean.
If a student scores 40 out of 50 on a test of mathematics (with a mean of 41 and a standard deviation of 5), and 60 out of 75 on a test of language (with a mean of 55 and a standard deviation of 10), the scores are not directly comparable. Converting each of them to their respective z-scores allows them to be compared directly.
z-score for 45= (45-40)/5=-0.2
z-score for 53=(53-40)/15=+0.5
Although the student scored 80% of the total points available on each test, the student did slightly better than average (+0.5 standard deviations) on the language test, and slightly worse than average (-0.2 standard deviations) on the math test.