Geography 200: INTRODUCTION TO RESEARCH METHODS FOR GEOGRAPHERS
Dr. Rodrigue
Graded Lab 10: A Medley of Ordinal Methods
LAB EXERCISE A: Spearman's Rank Order Correlation
For all calculations, show your answer rounded to three decimal places of accuracy (and several will be .000).
- Under which conditions would you favor Spearman's correlation over Pearson's, even if you had data measured at the interval or ratio level?
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________- What is the downside of using the Spearman's instead of the Pearson's with such scalar data?
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________- From somewhere in your dim memories of introductory physical geography, you may recall that the velocity of a stream affects its capacity to move its sediment load. All things being equal (which, of course, they rarely are), you would expect streams that undergo a drop in velocity to start depositing their load, starting with the larger particles first. A chief governor of stream velocity is slope angle, so you would expect a positive correlation between slope angle and size of particles dominating the soil along an alluvial fan. So, you've gone off into the field and collected soil samples from 15 randomly chosen points on an alluvial fan stretching out from the Funeral Mountains in Death Valley, CA. You've derived the median particle sizes for each of your fifteen samples once you get back to the lab. Here they are, and you get to do a Spearman's rank order correlation on each.
Slope angle (°) Median particle size (mm) 30.0 6.7 27.0 7.5 25.0 6.9 18.0 3.7 20.0 1.6 19.0 5.4 13.0 2.5 17.0 4.9 16.0 3.9 12.0 2.7 7.0 1.4 11.0 1.3 11.0 1.1 8.0 1.9 2.0 0.6Aw, heck, here are the data in a spreadsheet for your convenience.- First, Standard Operating Procedure, please state your working (alternate) hypothesis and your null hypothesis.
Working hypothesis: _____________________________________________________ _________________________________________________________________________ Null hypothesis: ________________________________________________________ _________________________________________________________________________- Now, is your alternate hypothesis directional (one- tailed) or non-directional (two-tailed)?
_____ one-tailed _____ two-tailed- Pick an alpha level and explain your choice.
alpha = __________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________- Why did "you" decide to do a Spearman's instead of a Pearson's on these scalar data? (hint: graphing each variable on a number line might help the answer pop out at you)
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________- After determining the rank for each measurement taken for each sample, subtract the rank on particle size from the rank on slope angle for each sample. Square these differences in rank. Sum the squared differences.
15 ___ 2 \ D = __________ /__ n=1- Multiply this sum of squared differences by six to complete the numerator of the second term of the rs equation:
__ 2 6 \ D = __________ /_- Do the denominator for the second term by cubing the sample size of 15 and then subtracting the sample size from the cube. You can check your work by squaring n, subtracting 1 from it, and then multiplying the answer by n. They should come out the same.
3 N - N = __________- Now do the entire second term of the rs by dividing its numerator by its denominator:
__ 2 6 \ D /_ _______ = __________ 3 (N - N)- Calculate rs by subtracting this second term from 1:
__ 2 6 \ D /_ rs = 1 - _______ = __________ 3 (N - N)- Now, let's see if this puppy is significant. Please use the following t test:
___ tcalc = rs * \/n-1- What would be your critical t, factoring in your chosen alpha, degrees of freedom (n-2), and the directionality of your alternate hypothesis?
tcrit = __________- Based on the relationship between your tcalc and the tcrit, would you reject the null hypothesis or not?
_____ reject _____ not reject- Figure out the probability of getting results that extreme if the null hypothesis were true (the courtesy step):
prob-value = __________- So, at long last, express your findings in regular prose:
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________
LAB EXERCISE B: Kruskall-Wallis Test
Let's say that you're an urban planner caught up in the debate about whether to calm traffic to create more livable neighborhoods or whether, instead, to enable speedy traffic in order to raise efficiency of transit. Traffic calming methods include more stop signs, windier and narrower streets, and perhaps speed bumps. The idea is to make a neighborhood more pedestrian-friendly than automobile-friendly for safety and for community-building purposes. Your town was all set to try the traffic-calming idea in certain key neighborhoods, but a local auto club has organized resistance to the idea and has cited greater pollution by stopped and slowed cars. The chore has fallen to you to collect data on carbon monoxide (CO) levels at residential intersections throughout town, in order to see if there is a significant difference among those intersections with stop signs, those with yield signs, and those with no control. The intersections from which you have built your three samples have approximately equal average daily traffic volumes, according to your colleagues who maintain traffic count databases. Here are your data, CO concentrations being given in parts per million:
stop signs yield signs no control 15 7 19 30 13 17 27 15 14 28 16 6 29 22 4 21 31 24 8 10 15 41 11 25 12 18 5 20 23 9Here they are in a spreadsheet.
- Why mightn't you use ANOVA on these obviously ratio data? Drawing a number line for each sample or, better, a histogram dividing the range from 4 to 41 into 9 categories 5 parts per million wide should make the reasons apparent. You don't need to show these graphs (they're just a suggestion to make the answer more apparent to you).
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________State your alternative (working) hypothesis and your null hypothesis about the rank means (the average of each group's overall rank): Working hypothesis: _____________________________________________________ _________________________________________________________________________ Null hypothesis: ________________________________________________________ _________________________________________________________________________
Now, planning can be a very politically-charged occupation. You know that the auto club that raised the issue would love a more exploratory study, since they're fishing for anything that would help them quash the traffic-calming idea. They would see graver consequences, then, in a Type II error (as in losing the political battle). Your bosses, on the other hand, would love to minimize the chance of a Type I error, since they do not want to knock the planning process back to square one, which too ready a finding of significant difference would enable. So, they would like a more rigorous study done. Given their "'druthers," label the following alpha levels with the constituency that might favor each (assuming they understood the concept of alpha <G>): ____________ ____________ high alpha low alpha 0.1 or more 0.01 or lessSo, where do you think you'll set your alpha, to minimize the hazards of the auto club debunking your study and of your bosses being unhappy about giving their opponents ammo? In other words, what would look like a pretty reasonable and hard to criticize alpha in this situation? alpha = ____________
Go on and build a simple spreadsheet to keep track of these intersections and their CO readings. The spreadsheet should have three columns:
- The CO reading for each intersection
- Which type of intersection it came from (S, Y, or N)
- The overall ranking of each intersection among all 30 intersections, which you create by sorting the entire spreadsheet on the first column, saving, and then entering the rank in the third column. If there are any ties (hint), you are to figure out the average rank of the scores involved and then assign that average to all of them. So, you could have something like 1, 2, 3, 4, 5.5, 5.5, 7, 8, 9, 10 or maybe 1, 2, 3, 5, 5, 5, 7, 8, 9, 10.
You may want to sort and print this version of the spreadsheet now.
Now, resort the entire spreadsheet, this time on the second column, the type of intersection.
For each intersection type, sum up its overall rank. ____________ ____________ ____________ stop sign yield sign no control sum of ranks sum of ranks sum of ranksSince you're going to need these, square each of these sums of ranks. ____________ ____________ ____________ stop sign yield sign no control squared sum squared sum squared sumDivide each sum of ranks (3 answers back) by the number of cases in that type to get the mean rank for that sample, just because it's nice to have. __________ __________ __________ stop sign yield sign no control mean rank mean rank mean rankNow, just "eyeballing" the mean ranks for your three intersection types, do they "look" different to you? __________Divide each squared sum of ranks (3 answers back) by the number of cases in that type to get the mean squared sum of ranks for that sample, which will come in handy. __________ __________ __________ stop sign yield sign no control mean sq sum mean sq sum mean sq sum of rank of rank of rankSo, now calculate H: _ _ | 12 _k_ Ri2 | H = | ______ \ ___ | - 3(N+1) | N(N+1) /__ ni | _-________i=1_____-___________ 1-(ΣT/(N3 - N))Where T is a correction for any ties T = ti3 - tiNumerator for H = __________How many scores were tied for a single rank? __________ = ti
So, ti3 = __________
T = __________
Denominator (correction for ties) for H = __________
H = __________
Compare your H with a critical statistic, in order to decide what to do about the null hypothesis. As long as the data table has at least three columns (two degrees of freedom) and as long as the smallest sample has at least five cases in it (both of which apply to our data), the Chi-squared distribution is used as the critical value table. Using a Chi-squared critical value table (and your textbook has one, by the way), determine the critical statistic for two degrees of freedom (3 samples minus 1): X2crit = __________So, what about that null hypothesis? Do you reject it or not reject it? _____ reject Ho _____ not-reject HoIn this charged atmosphere, you had better do the final courtesy step, calculating the prob-value, and let 'em all duke it out for themselves! Consulting the "prob values for X2" table, which you can download from https://home.csulb.edu/~rodrigue/geog200/chisquareprobvalues.ods find the probability of getting results as extreme as yours if the null hypothesis were, in fact, true (as your supervisors are hoping). To do that, look up your X2calc on the left axis and your degrees of freedom (here 2) and read the probability of exceeding your observed X2 through random sampling error at the intersection of that row and column. It is okay to leave this answer at 4 decimal places of accuracy. prob-value = __________So, as delicately as possible, state the results of your intersection sampling study in simple English. _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________
LAB EXERCISE C: Wilcoxon Rank Sum W Test
Moving down to the next level of (at least, conceptual) simplicity, let's have a go at the Wilcoxon W test, which is the ordinal answer to the two-sample difference of means test. Our data set this time has to do with gun murder rates and gun control. Our two variables are the 46 states/District of Columbia with data on the rate of murders involving guns (FBI 1997) and whether people in the state are allowed to carry concealed guns (NRA 1999).
Your spreadsheet can be downloaded at https://home.csulb.edu/~rodrigue/geog200/gunmurderCCWilcoxonW.ods. It contains the names of the 46 states and DC, the gun murder rate in 1997, and the degree of permissiveness towards concealed carry. A "1" means the state has little to no restrictions on concealed carry: You either can carry a gun hidden on your person without any kind of permit, or you have to have a permit but you nearly automatically get one if you ask for one. A "3" means the state either completely forbids concealed carry, or else it makes it very difficult to receive a permit (you might need to demonstrate a particular need for such a permit or you have to take a lot of classes and demonstrate that you can shoot very accurately with either hand using the particular gun you plan to carry concealed).
- What is your alternate hypothesis concerning gun murder rates in states with little or no restriction on people carrying concealed guns and states which don't allow concealed carry or strongly restrict it?
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________- So, what would the null hypothesis be?
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________- Is this a directional (one-tailed) or non-directional (two-tailed) test?
One-tailed _____ Two-tailed _____- Let's say that we were doing an exploratory study and didn't want to miss a possible difference between gun-restricting states and gun-permissive states in their gun murder rates. So, let's be generous with our alpha. Let's have a confidence level, then, of just 90 percent. What's the corresponding alpha level?
alpha = __________- Highlight the whole spreadsheet and then click on Data and then on Sort and then pick "concealed carry 99" and ask it to sort in ascending order. Now, count the number of "1" (permissive) states and the number of "3" (restrictive) states. Which of the two lists is the smaller?
_____ States allowing concealed carry (coded as 1 in your spreadsheet) _____ States not allowing concealed carry (coded as 2)- What is Wi? That is, what is the sum of ranks for just the smaller sample?
Wi = __________- What is the expected sum of ranks?
_ n1+n2+1 Wi = ni _________ 2 _ Wi = __________- What's the standard error of W?
______________________ Sw = \/ {n1*n2([n1+n2+1]/12)} Sw = __________- Now, calculate the Wilcoxon W test statistic, ZW:
_ ZW = (Wi - Wi)/Sw ZW = __________- Eyeballing that Z and comparing it to the infinity row on a t-table, such as this one (under the appropriate column for the right alpha with the right number of tails), what do we do with the null hypothesis?
_____ reject Ho _____ not reject Ho- That last bit of professional courtesy: What's the probability of getting results this "extreme" through random sampling? On a normal table (Z table), look up your Zw calc and give the probability. If the Zw calc value is bigger than the largest Z in the table, give the answer as less than (<) the probability for that largest Z value. Alternatively and more easily, you can use Calc by typing =1-normsdist(#) where # is your calculated Z for a 2 tailed test and =2*(1-normsdist(#)) for a 1 tailed test.
Prob-value = __________- In regular language, please state your conclusions:
_________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________Interestingly enough, this is another situation in which the Modifiable Areal Unit Problem may be relevant. For an interesting analysis, which uses county level data (gun permits are generally approved at the level of the county sheriff, and different counties in the same state can be wildly different in terms of permissiveness and restrictiveness on this issue), take a look at Lott, John R., Jr. 2000. More Guns, Less Crime, 2nd ed. Chicago and London: The University of Chicago Press. This author uses a multiple regression methodology to hold several confounding variables constant and then analyzes gun crime rates through time, focussing on whether crime goes up or down significantly after a change in government policy towards guns and concealed carry. He concludes that trends in crime at the county level deteriorate when restrictions are implemented and improve when the law is made more permissive. His methodology and results have been contested, but the MAUP dimensions make this an interesting controversy.
first placed on the web: 11/27/99
last revised: 05/01/17
© Dr. Christine M. Rodrigue