Dr. Rodrigue, Intro. to Research Methods, Lab 10

Geography 200: INTRODUCTION TO RESEARCH METHODS FOR GEOGRAPHERS

Dr. Rodrigue

Graded Lab 10: A Medley of Ordinal Methods

==========

LAB EXERCISE A: Spearman's Rank Order Correlation

For all calculations, show your answer rounded to three decimal places of accuracy (and several will be .000).

Under which conditions would you favor Spearman's correlation over Pearson's, even if you had data measured at the interval or ratio level?


     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

What is the downside of using the Spearman's instead of the Pearson's with such scalar data?

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

From somewhere in your dim memories of introductory physical geography, you may recall that the velocity of a stream affects its capacity to move its sediment load. All things being equal (which, of course, they rarely are), you would expect streams that undergo a drop in velocity to start depositing their load, starting with the larger particles first. A chief governor of stream velocity is slope angle, so you would expect a positive correlation between slope angle and size of particles dominating the soil along an alluvial fan. So, you've gone off into the field and collected soil samples from 15 randomly chosen points on an alluvial fan stretching out from the Funeral Mountains in Death Valley, CA. You've derived the median particle sizes for each of your fifteen samples once you get back to the lab. Here they are, and you get to do a Spearman's rank order correlation on each.
```
     Slope angle (°)          Median particle size (mm)

          30.0                      6.7
          27.0                      7.5
          25.0                      6.9
          18.0                      3.7
          20.0                      1.6                      
          19.0                      5.4
          13.0                      2.5
          17.0                      4.9
          16.0                      3.9
          12.0                      2.7
           7.0                      1.4
          11.0                      1.3
          11.0                      1.1
           8.0                      1.9
           2.0                      0.6
```
Aw, heck, here are the data in a spreadsheet for your convenience.

First, Standard Operating Procedure, please state your working (alternate) hypothesis and your null hypothesis.

     Working hypothesis: _____________________________________________________

     _________________________________________________________________________

     Null hypothesis: ________________________________________________________

     _________________________________________________________________________

Now, is your alternate hypothesis directional (one- tailed) or non-directional (two-tailed)?
```
     _____ one-tailed          _____ two-tailed
```

Pick an alpha level and explain your choice.

     alpha = __________ 

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

Why did "you" decide to do a Spearman's instead of a Pearson's on these scalar data? (hint: graphing each variable on a number line might help the answer pop out at you)

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

After determining the rank for each measurement taken for each sample, subtract the rank on particle size from the rank on slope angle for each sample. Square these differences in rank. Sum the squared differences.
```
     15
     ___  2
     \   D  = __________
     /__
     n=1
```
Multiply this sum of squared differences by six to complete the numerator of the second term of the r_s equation:
```
          __  2
        6 \  D  = __________
          /_
```
Do the denominator for the second term by cubing the sample size of 15 and then subtracting the sample size from the cube. You can check your work by squaring n, subtracting 1 from it, and then multiplying the answer by n. They should come out the same.
```
      3
     N  - N = __________
```

Now do the entire second term of the r_s by dividing its numerator by its denominator:

      __  2             
    6 \  D              
      /_                
    _______ = __________
     3                  
   (N  - N)

Calculate r_s by subtracting this second term from 1:

                __  2                                                                
              6 \  D                                        
                /_                                          
     r_s = 1 - _______ = __________                              
               3                                            
             (N  - N)

Now, let's see if this puppy is significant. Please use the following t test:
```
                    ___
      t_calc = r_s * \/n-1
```
What would be your critical t, factoring in your chosen alpha, degrees of freedom (n-2), and the directionality of your alternate hypothesis?
```
     t_crit = __________
```
Based on the relationship between your t_calc and the t_crit, would you reject the null hypothesis or not?
```
     _____ reject          _____ not reject
```
Figure out the probability of getting results that extreme if the null hypothesis were true (the courtesy step):
```
     prob-value = __________
```

So, at long last, express your findings in regular prose:

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

LAB EXERCISE B: Kruskall-Wallis Test

Let's say that you're an urban planner caught up in the debate about whether to calm traffic to create more livable neighborhoods or whether, instead, to enable speedy traffic in order to raise efficiency of transit. Traffic calming methods include more stop signs, windier and narrower streets, and perhaps speed bumps. The idea is to make a neighborhood more pedestrian-friendly than automobile-friendly for safety and for community-building purposes. Your town was all set to try the traffic-calming idea in certain key neighborhoods, but a local auto club has organized resistance to the idea and has cited greater pollution by stopped and slowed cars. The chore has fallen to you to collect data on carbon monoxide (CO) levels at residential intersections throughout town, in order to see if there is a significant difference among those intersections with stop signs, those with yield signs, and those with no control. The intersections from which you have built your three samples have approximately equal average daily traffic volumes, according to your colleagues who maintain traffic count databases. Here are your data, CO concentrations being given in parts per million:

     stop signs      yield signs      no control
   
           15              7            19  
           30             13            17
           27             15            14
           28             16             6
           29             22             4
           21             31            24
            8             10            15
           41             11            25
           12             18             5
           20             23             9

Here they are in a spreadsheet.

Why mightn't you use ANOVA on these obviously ratio data? Drawing a number line for each sample or, better, a histogram dividing the range from 4 to 41 into 9 categories 5 parts per million wide should make the reasons apparent. You don't need to show these graphs (they're just a suggestion to make the answer more apparent to you).

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________
 
     _________________________________________________________________________

State your alternative (working) hypothesis and your null hypothesis about the rank means (the average of each group's overall rank):

     Working hypothesis: _____________________________________________________

     _________________________________________________________________________

     Null hypothesis: ________________________________________________________

     _________________________________________________________________________

Now, planning can be a very politically-charged occupation. You know that the auto club that raised the issue would love a more exploratory study, since they're fishing for anything that would help them quash the traffic-calming idea. They would see graver consequences, then, in a Type II error (as in losing the political battle). Your bosses, on the other hand, would love to minimize the chance of a Type I error, since they do not want to knock the planning process back to square one, which too ready a finding of significant difference would enable. So, they would like a more rigorous study done. Given their "'druthers," label the following alpha levels with the constituency that might favor each (assuming they understood the concept of alpha <G>):
```
     ____________        ____________
     high alpha          low alpha
     0.1 or more         0.01 or less
```
So, where do you think you'll set your alpha, to minimize the hazards of the auto club debunking your study and of your bosses being unhappy about giving their opponents ammo? In other words, what would look like a pretty reasonable and hard to criticize alpha in this situation?
```
     alpha = ____________
```
Go on and build a simple spreadsheet to keep track of these intersections and their CO readings. The spreadsheet should have three columns:
- The CO reading for each intersection
- Which type of intersection it came from (S, Y, or N)
- The overall ranking of each intersection among all 30 intersections, which you create by sorting the entire spreadsheet on the first column, saving, and then entering the rank in the third column. If there are any ties (hint), you are to figure out the average rank of the scores involved and then assign that average to all of them. So, you could have something like 1, 2, 3, 4, 5.5, 5.5, 7, 8, 9, 10 or maybe 1, 2, 3, 5, 5, 5, 7, 8, 9, 10.
You may want to sort and print this version of the spreadsheet now.
Now, resort the entire spreadsheet, this time on the second column, the type of intersection.

For each intersection type, sum up its overall rank.

     ____________        ____________        ____________
     stop sign           yield sign          no control
     sum of ranks        sum of ranks        sum of ranks

Since you're going to need these, square each of these sums of ranks.

     ____________        ____________        ____________
     stop sign           yield sign          no control
     squared sum         squared sum         squared sum

Divide each sum of ranks (3 answers back) by the number of cases in that type to get the mean rank for that sample, just because it's nice to have.

     __________          __________          __________
     stop sign           yield sign          no control
     mean rank           mean rank           mean rank

Now, just "eyeballing" the mean ranks for your three intersection types, do they "look" different to you?
```
     __________
```

Divide each squared sum of ranks (3 answers back) by the number of cases in that type to get the mean squared sum of ranks for that sample, which will come in handy.

     __________          __________          __________
     stop sign           yield sign          no control
     mean sq sum         mean sq sum         mean sq sum
     of rank             of rank             of rank

So, now calculate H:
```
           _                _                       
          |   12    _k_  R_i² |  
     H =  | ______  \   ___ |  - 3(N+1)
          | N(N+1)  /__  n_i  |  
          _-________i=1_____-___________
             1-(ΣT/(N³ - N))
```
Where T is a correction for any ties T = t_i³ - t_i
How many scores were tied for a single rank? __________ = t_i
So, t_i³ = __________
T = __________
Numerator for H = __________
Denominator (correction for ties) for H = __________
H = __________
Compare your H with a critical statistic, in order to decide what to do about the null hypothesis. As long as the data table has at least three columns (two degrees of freedom) and as long as the smallest sample has at least five cases in it (both of which apply to our data), the Chi-squared distribution is used as the critical value table. Using a Chi-squared critical value table (and your textbook has one, by the way), determine the critical statistic for two degrees of freedom (3 samples minus 1):
```
     X²_crit = __________
```
So, what about that null hypothesis? Do you reject it or not reject it?
```
     _____ reject H_o     _____ not-reject H_o
```
In this charged atmosphere, you had better do the final courtesy step, calculating the prob-value, and let 'em all duke it out for themselves! Consulting the "prob values for X²" table, which you can download from https://home.csulb.edu/~rodrigue/geog200/chisquareprobvalues.ods find the probability of getting results as extreme as yours if the null hypothesis were, in fact, true (as your supervisors are hoping). To do that, look up your X²_calc on the left axis and your degrees of freedom (here 2) and read the probability of exceeding your observed X² through random sampling error at the intersection of that row and column. It is okay to leave this answer at 4 decimal places of accuracy.
```
     prob-value = __________
```

So, as delicately as possible, state the results of your intersection sampling study in simple English.

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

LAB EXERCISE C: Wilcoxon Rank Sum W Test

Moving down to the next level of (at least, conceptual) simplicity, let's have a go at the Wilcoxon W test, which is the ordinal answer to the two-sample difference of means test. Our data set this time has to do with gun murder rates and gun control. Our two variables are the 46 states/District of Columbia with data on the rate of murders involving guns (FBI 1997) and whether people in the state are allowed to carry concealed guns (NRA 1999).

Your spreadsheet can be downloaded at https://home.csulb.edu/~rodrigue/geog200/gunmurderCCWilcoxonW.ods. It contains the names of the 46 states and DC, the gun murder rate in 1997, and the degree of permissiveness towards concealed carry. A "1" means the state has little to no restrictions on concealed carry: You either can carry a gun hidden on your person without any kind of permit, or you have to have a permit but you nearly automatically get one if you ask for one. A "3" means the state either completely forbids concealed carry, or else it makes it very difficult to receive a permit (you might need to demonstrate a particular need for such a permit or you have to take a lot of classes and demonstrate that you can shoot very accurately with either hand using the particular gun you plan to carry concealed).

What is your alternate hypothesis concerning gun murder rates in states with little or no restriction on people carrying concealed guns and states which don't allow concealed carry or strongly restrict it?


     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

So, what would the null hypothesis be?


     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

Is this a directional (one-tailed) or non-directional (two-tailed) test?
```
     One-tailed _____          Two-tailed _____
```
Let's say that we were doing an exploratory study and didn't want to miss a possible difference between gun-restricting states and gun-permissive states in their gun murder rates. So, let's be generous with our alpha. Let's have a confidence level, then, of just 90 percent. What's the corresponding alpha level?
```
     alpha = __________
```
Highlight the whole spreadsheet and then click on Data and then on Sort and then pick "concealed carry 99" and ask it to sort in ascending order. Now, count the number of "1" (permissive) states and the number of "3" (restrictive) states. Which of the two lists is the smaller?
```
     _____ States allowing concealed carry (coded as 1 in your spreadsheet)  

     _____ States not allowing concealed carry (coded as 2)
```
What is W_i? That is, what is the sum of ranks for just the smaller sample?
```
     W_i = __________ 
```

What is the expected sum of ranks?

     
     _       n₁+n₂+1
     W_i = n_i _________
                2
     _
     W_i = __________

What's the standard error of W?

            ______________________
     S_w = \/ {n₁*n₂([n₁+n₂+1]/12)}

     S_w = __________

Now, calculate the Wilcoxon W test statistic, Z_W:

                _
     Z_W = (W_i - W_i)/S_w

     Z_W = __________

Eyeballing that Z and comparing it to the infinity row on a t-table, such as this one (under the appropriate column for the right alpha with the right number of tails), what do we do with the null hypothesis?
```
     _____ reject H_o          _____ not reject H_o
```
That last bit of professional courtesy: What's the probability of getting results this "extreme" through random sampling? On a normal table (Z table), look up your Z_{w calc} and give the probability. If the Z_{w calc} value is bigger than the largest Z in the table, give the answer as less than (<) the probability for that largest Z value. Alternatively and more easily, you can use Calc by typing =1-normsdist(#) where # is your calculated Z for a 2 tailed test and =2*(1-normsdist(#)) for a 1 tailed test.
```
     Prob-value = __________
```
In regular language, please state your conclusions:
```
     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________
```
Interestingly enough, this is another situation in which the Modifiable Areal Unit Problem may be relevant. For an interesting analysis, which uses county level data (gun permits are generally approved at the level of the county sheriff, and different counties in the same state can be wildly different in terms of permissiveness and restrictiveness on this issue), take a look at Lott, John R., Jr. 2000. More Guns, Less Crime, 2nd ed. Chicago and London: The University of Chicago Press. This author uses a multiple regression methodology to hold several confounding variables constant and then analyzes gun crime rates through time, focussing on whether crime goes up or down significantly after a change in government policy towards guns and concealed carry. He concludes that trends in crime at the county level deteriorate when restrictions are implemented and improve when the law is made more permissive. His methodology and results have been contested, but the MAUP dimensions make this an interesting controversy.

==========

Geography 200: INTRODUCTION TO RESEARCH METHODS FOR GEOGRAPHERS

Dr. Rodrigue

Graded Lab 10: A Medley of Ordinal Methods

first placed on the web: 11/27/99 last revised: 05/01/17 © Dr. Christine M. Rodrigue

first placed on the web: 11/27/99
last revised: 05/01/17
© Dr. Christine M. Rodrigue