GEOG 442

Biogeography

Area Pattern Analysis for Biogeography

==========

Quadrat-based techniques of area pattern analysis are commonly used in biogeography and ecology. They involve the division of a study area into equal-sized plots, usually through a grid of squares. This permits the use of statistical techniques to analyze quantitative data with no more measurement sophistication than mere frequencies by category (nominal data). The purpose of this lab is to introduce you to Chi-squared analysis, which is a popular approach to discerning relationships among plant species in a quadrat-based analysis.

For your reference pleasure, the definitional formula for Chi-squared is:

          r   k                       
         __  __  (Oij - Eij)2
     X2 =\   \  ____________
         /_  /_      Eij
         i=1 j=1

You'll be comforted to know that I'll walk you through a fairly simple step-by-step approach to doing Chi-square.

==========

Formulating Hypotheses

In statistical evaluation, we set up working and null hypotheses for testing purposes. So, eyeballing the map in Figure 1 below, formulate your hunch about the relationship between the distributions of the two plant species described below.

     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

The problem for scientific reasoning is such a hunch cannot be tested directly. To create a testable hypothesis, you need to set up a null version of your hunch, or working hypothesis. That is, you need to express the reverse of your expectation. That way, if you reject this testable null hypothesis, the only logical conclusion is that your original hypothesis is the only viable one left. If this mystifies you, please review your statistics course notes or take a stats course. Mystified or not, please state the null version of your hypothesis:
     _________________________________________________________________________

     _________________________________________________________________________

     _________________________________________________________________________

We will reject the null hypothesis if our results are so extreme that there is no more than a five percent chance that we could have gotten them by pure random luck-of-the-draw in developing our sample in the map below. That is (stats refesher), we will use the 0.05 alpha standard. Another way of looking at it is that, by using such an extreme standard, we can have a 95 percent confidence in our conclusion, should we wind up rejecting the null hypothesis and deciding that the association between these plants is not random.

==========

The Data

Figure 1 shows the distribution of two plant species, Salvia apiana (white sage) and Avena barbata (slender oat). We can characterize each of the larger quadrats (the ones labeled A1 or F9 or J5, for example) as belonging to one of the four quadrat types listed below.

All 100 quadrats must be accounted for, each in no more than one category. Because I am such a nice person (and because so many of you may have done the grunt work on this lab in Geography 200 or 140 or a similar kind of lab in Biology 260), I'll present the data already conveniently preclassified for your statistical pleasure. These are your observed or real-world frequencies:


                      |                  SALVIA                 |
                      |                    |                    |
                      |      present       |       absent       |   row totals
     _________________________________________________________________________
                      |(a)                 |(b)                 |-e-
           present    |       33           |        15          |
                      |                    |                    |
     AVENA   _________________________________________________________________
                      |(c)                 |(d)                 |-f-
           absent     |       49           |         3          |
                      |                    |                    |
     _________________________________________________________________________
                      |-g-                 |-h-                 |-i-
     column totals    |                    |                    |   
                      |                    |                    | n = 


==========

Doing the Analysis

  1. Compute the "marginal totals." That is, sum the observed frequencies in each row and put those sums in the appropriate row total (e or f). Do the same for the frequencies in each column and put those sums in the appropriate column total (g or h). The sum of row totals should equal the sum of column totals. If so, put the total number or n (which had better equal 100) in cell i.

  2. Then, create the "expected frequencies" for each data cell (a through d). This is the distribution of cell counts you would expect from your data if there were no association between the two plant species (i.e., random processes were allocating them among the cells willy-nilly). To do this for each data cell, a through d, multiply the row total to its right by the column total below it and then divide the answer by n. Put the answer, rounded to three decimal places of accuracy, in its cell below the actual observed frequency.

    Still lost? Okay, okay. In other words, multiply cells e and g and divide the answer by cell i. Put the answer, properly rounded, in the lower part of cell a. Similarly, multiply cells e and h and divide by i, and put that answer in cell b. Multiply cells f and g and divide by i, and plop that answer in cell c. Lastly, multiply cell f by cell h, divide by i again, and put the result in cell d.

  3. That done, examine the expected frequencies to make sure you can properly proceed. Chi-square should not be used if any expected frequencies are below 2 (or, irrelevantly in this case, if more than 20 percent of the data cells have fewer than 5 actual cases). You will note that there are no such problems with your contingency table, so you can safely proceed through Chi-square.

  4. Now, move on to the worksheet below for calculating Chi-squared. In the first column, enter the observed frequencies for each data cell (the number in the upper part of cells a through d).

  5. In the second column, square those frequencies.

  6. In the third column, divide each squared frequency by the corresponding expected frequency that you worked out in the bottom of the appropriate data cell (a through d).

  7. Now, sum the third column and put the answer near the bottom of the spreadsheet (sum(O2/E). Show your work here to three decimal places of accuracy.

  8. Finally, subtract n (from cell i) from that sum. This answer is your calculated Chi-squared (X2). Put it at the bottom of the whole spreadsheet, also rounded to three decimal places of accuracy.
         ________________________________________________________________________
    
          DATA CELL |     O     |       O2       |               O2/E
         ________________________________________________________________________
            (a)    |           |                |
         ________________________________________________________________________     
            (b)    |           |                |
         ________________________________________________________________________
            (c)    |           |                |
         ________________________________________________________________________
            (d)    |           |                |
         ________________________________________________________________________
                                                | sum(O2/E) = 
         ________________________________________________________________________
                                                | sum(O2/E) - n = X2 =
         ________________________________________________________________________
    
    
  9. Now, to interpret this hard-gained number, your X2calc, you need to compare it with a critical X2. To do this, you will need the Chi-squared table in Figure 2. You need your pre-selected alpha level to pick the right column and the degrees of freedom for your 2 x 2 contingency table to choose the right row to enter the table. Degrees of freedom in Chi-squared can be defined as:
         DF = (r - 1)(k - 1)
         where r = number of rows and k = number of columns
    
    
    So, you will enter the table at the intersection of:
         the column headed ________ 
    
         and the row corresponding to ________ degrees of freedom.
    
    What, then, is your critical Chi-squared value?
         X2crit =  ________
    
    
  10. Is your X2calc ________ greater than or ________ less than the X2crit?

  11. If your actual, calculated Chi-square value is greater than the critical Chi-square, you may safely conclude that your pattern is not just a random one. In other words, there is a statistically significant probability that there is a real association of some sort between your variables (in this case, between the two plant species). If the calculated Chi-square value is less than the critical test value, the relationship probably is random. Can the null hypothesis of random association between these two plant species in this study area be rejected in this case?
         _____ reject Ho          _____ do not reject Ho
    
    
  12. It's always good etiquette, whenever possible, to calculate the prob-value of a Type I error, to express your faith in the null hypothesis, however, in the off chance that a reader may have compelling reasons to use a different standard of alpha than you chose. I have provided you the needed data in Figure 3 to tell the probability that you could have gotten results as extreme as yours if there is but a random association between the two plant species.
         ________ prob-value of Ho
    
    
  13. Plot complication. Chi-squared is notoriously sensitive to sample size. That is, the same percentages in each cell can appear significant in a big sample (large n) or insignificant in a small sample. It might help to assess the strength of a significant relationship, should the Chi-squared test find one. For that, you can use Yule's Q. Yule's Q, however, can only be calculated for contingency tables with no more than two rows and two columns (bigger tables can sometimes be collapsed into a 2 x 2 format, by combining rows and columns in some sort of logical way). Conveniently, this lab just happens to feature a 2 x 2 table.

    To calculate Yule's Q, multiply cells a and d and also cells b and c. Then, enter these multiplications into the following formula:

              ad - bc
         Q =  _______
              ad + bc
    
    
    So, what is the Q value for this lab? ________

  14. Now, what does it all MEAN? Basically, Yule's Q can vary from -1 to +1. The closer it is to 0, the weaker the relationship is. The closer it is to -1 or +1, the stronger the relationship is, whether inverse (negative) or direct (positive).

    Please interpret the results of Lab B, taking into consideration both Chi-squared and Yule's Q. What sort of ecological relationship, if any, exists between Salvia apiana and Avena barbata at this scale of analysis?

         _________________________________________________________________________
    
         _________________________________________________________________________
    
         _________________________________________________________________________
    
         _________________________________________________________________________
    
         _________________________________________________________________________
    
         _________________________________________________________________________
    
    
    ==========

    Figure 1 Map of Oats and Sage

    [ map of oats and sage ]

    ==========

    Figure 2 Critical Values for Chi-Square (X2crit)

        
                          alpha                
    
     df     0.100     0.050     0.025     0.010     0.005
                                                 
      1     2.706     3.841     5.024     6.635     7.879
      2     4.605     5.991     7.378     9.210    10.597
      3     6.251     7.815     9.348    11.345    12.838
      4     7.779     9.488    11.143    13.277    14.860
      5     9.236    11.070    12.832    15.086    16.750
      6    10.645    12.592    14.449    16.812    18.548
      7    12.017    14.067    16.013    18.475    20.278
      8    13.362    15.507    17.535    20.090    21.955
      9    14.684    16.919    19.023    21.666    23.589
     10    15.987    18.307    20.483    23.209    25.188
     11    17.275    19.675    21.920    24.725    26.757
     12    18.549    21.026    23.337    26.217    28.300
     13    19.812    22.362    24.736    27.688    29.819
     14    21.064    23.685    26.119    29.141    31.319
     15    22.307    24.996    27.488    30.578    32.801
     16    23.542    26.296    28.845    32.000    34.267
     17    24.769    27.587    30.191    33.409    35.718
     18    25.989    28.869    31.526    34.805    37.156
     19    27.204    30.144    32.852    36.191    38.582
     20    28.412    31.410    34.170    37.566    39.997
     21    29.615    32.671    35.479    38.932    41.401
     22    30.813    33.924    36.781    40.289    42.796
     23    32.007    35.172    38.076    41.638    44.181
     24    33.196    36.415    39.364    42.980    45.558
     25    34.382    37.652    40.646    44.314    46.928
     26    35.563    38.885    41.923    45.642    48.290
     27    36.741    40.113    43.195    46.963    49.645
     28    37.916    41.337    44.461    48.278    50.994
     29    39.087    42.557    45.722    49.588    52.335
     30    40.256    43.773    46.979    50.892    53.672
     40    51.805    55.758    59.342    63.691    66.766
     50    63.167    67.505    71.420    76.154    79.490
     60    74.397    79.082    83.298    88.379    91.952
     70    85.527    90.531    95.023   100.425   104.215
     80    96.578   101.879   106.629   112.329   116.321
     90   107.565   113.145   118.136   124.116   128.299
    100   118.498   124.342   129.561   135.807   140.170
                                                                                    
    
    ==========

    Figure 3: p-Values for X2calc

         X2    1 DF       X2    1 DF       X2    1 DF        X2    1 DF
    
        3.2   .0736      4.4   .0359      5.6   .0180      6.8   .0091
        3.3   .0692      4.5   .0339      5.7   .0170      6.9   .0086
        3.4   .0652      4.6   .0320      5.8   .0160      7.0   .0082
        3.5   .0614      4.7   .0302      5.9   .0151      7.1   .0077
        3.6   .0578      4.8   .0285      6.0   .0143      7.2   .0073
        3.7   .0544      4.9   .0268      6.1   .0135      7.3   .0669
        3.8   .0513      5.0   .0254      6.2   .0128      7.4   .0065
        3.9   .0483      5.1   .0239      6.3   .0121      7.5   .0062
        4.0   .0455      5.2   .0226      6.4   .0114      7.6   .0058
        4.1   .0429      5.3   .0213      6.5   .0108      7.7   .0055
        4.2   .0404      5.4   .0201      6.6   .0102      7.8   .0052
        4.3   .0381      5.5   .0190      6.7   .0096     >7.8  <.0050
    
    
    
    

    ==========

    first placed on the web: 11/26/98
    last revised: 03/06/07
    © Dr. Christine M. Rodrigue

    ==========