Lab 4: Chi-Squared Survey Analysis, Locational Analysis, Dr. Rodrigue, F/98

GEOG 216-01

Locational Analysis

Lab 4: Chi-Squared Survey Analysis

Purpose of the Lab:

This lab introduces you to a more rigorous analysis of the survey results you summarized more subjectively in Lab 3. It introduces you to the Chi-squared method for determining the probability of a relationship between two variables. Chi-squared is a non-parametric technique, that is, it does not make any assumptions about the underlying probability distribution governing the population from which your sample was taken (e.g., you don't have to assume it's normal). It also is designed to work with data at the crudest level of measurement, namely, nominal (frequency counts within categories) or higher-level data reduced to nominal scale. As a result, this technique has wide applications and has the virtue of easy calculation and interpretation, to boot.

Type I and Type II Errors, Confidence Levels, Alpha, and Beta, Again

In class, we went over the concepts of Type I errors (thinking there's a real relationship between variables when pure chance could have created your results) and Type II errors (dismissing as random a real relationship). The probability of a Type I error is termed "alpha" and that of a Type II error "beta." Lowering the probability of alpha raises the probability of beta and vice-versa. To make decisions in conditions of uncertainty about the underlying population, then, requires setting a standard, a "confidence level."

The confidence level is expressed either as a percentage (95% confidence level) or as a proportion (.95 confidence level). The confidence level equals 1 minus alpha (expressed as a proportion) or 100 minus alpha (expressed as a percentage). Alpha is also sometimes referred to as the "signficance level." Setting a confidence level means that you are so confident of your sampling method that, in, say, 95% of all possible samples you COULD have taken from your target population, you would not have gotten results as extreme as yours through sheer chance. That is, random processes would have produced results as extreme as yours in only 5% of possible samples: Your results are significant at the 0.05 level.

To pick this confidence level requires you to think about the worst-case consequences of Type I and Type II errors. You set your confidence level to minimize the chances of the error that would have the graver consequences. In some cases, an alpha of 0.0001 would not be small enough; in others, it would be silly to set it any lower than 0.2000.

Location analytic and market analytic problems are likelier to need a fairly small confidence level (or, conversely, a relatively large alpha). This is because the consequences of missing an exploitable relationship are greater than the consequences of ascribing excessive importance to a random quirk in your data. For the purposes of this lab, then, we will set our confidence level quite a bit below the norms seen in most social science and natural science research. Let us opt for 85% confidence, meaning we will accept alpha levels up through 0.1500.

Set Up Your Null Hypotheses

In class, we discussed how you state as precisely as you can your expected relationships between variables. You then state the null versions of these expectations (e.g., "there is no significant relationship between A and B at the 0.05 level"). The null hypothesis is your default assumption unless your results are so extreme that you are confident you can reject it safely (consistent with your pre-chosen alpha level). Go through the ten tables below and state the null hypothesis pertaining to each.

Back to the Data

To use Chi-squared, you set up contingency tables (rows and columns, like a spreadsheet). The rows represent the categories of one variable and the columns represent the categories of the other variable. Each case has to fit into one and only one category of each variable: There has to be no question as to which cell a given case fits into. This is why I'm going to rearrange a few of the tables you had in Lab 3: They were set up such that one respondent could fit into more than one category of one of the variables. For example, you could have a News-X reader in Table 1 who reads three of the other papers. The reformatted tables are at the end of this lab.

For each table of observed data, you need to create a table of expected values. Usually, the expected values are put in the same cell as the observed counts, below them. That's why the tables are larger -- to give you room to enter the expected values below the observed counts.

Examine your table before proceeding. Make sure fewer than 20% of your observed cell counts are less than 5. Chi-squared produces squirrely results if too many cells have tiny observed counts. In that case, you simply collapse rows or columns in some sensible manner to reduce the number of cells and to increase the cell counts in them.

To calculate the expected values for a cell, multiply each row total on the far right with the column total at the bottom. Divide the answer by the total number of respondents (the number in the bottom rightmost corner). For the purposes of this lab, it will be sufficient to calculate the expected values to two decimal places of accuracy. Put the answer in the appropriate cell, below the observed count. Again, check the resulting table. If you have any cells with expected values below 2, you should collapse rows or columns and recalculate.

That done, here's the formula for Chi-squared:

(O_i - E_i)²
X² = sum __________
E_i

To calculate this as given, set up a spreadsheet model, with column A labeled "Cell"; column B labeled "O"; column C, "E"; column D, "O-E"; column E, "(O- E)²"; and column F, "(O-E)²/E". For each table, it helps to letter each of the cells with observed counts in them, to keep track of it all. In the spreadsheet, enter the observed counts in column B in the row corresponding with the cell name (e.g., a, b, c, d, e, g,....). In column C, enter the appropriate expected value for that cell. In column D, subtract the value in column C from that in column B. Column E squares the value in Column D. Column F divides the number in column E by the value in column C. Got that (I hope)?

Now, sum the values in column F for as many lettered cells as you have. That number is your calculated X². Well, now you have it, what do you do with it? For this lab, we will use the prob-value approach to doing X².

You need to figure out the probability that you could have gotten results as extreme as your actual calculated X² by sheer random processes. In other words, what is the chance of committing a Type I error? To do this, consult the handout entitled "p-values for X²." The X² column along the left axis is clear enough: It's your calculated X². The top axis, however, is labeled "degrees of freedom," and it's clear you need these to get into the body of the table.

Degrees of freedom is simply the number of data rows in your table minus one times the number of data columns minus one. So, in a contingency table with two data rows and two data columns, you'd have (2-1)*(2-1) or (1)*(1) or 1 degree of freedom. If your table had three rows and two columns, you'd have two degrees of freedom; if it had six rows and six columns, you'd be up to 25 degrees of freedom.

Armed now with both your calculated X² AND your degrees of freedom, read down and across into the body of the table and note the probability value there. As with the normal table, this is given in thousands. Divide the number by 10,000 to get alpha expressed as a proportion or by 100 to express alpha as a percentage. For this lab, I want you to show the prob-values as proportions, rounded to two decimal places.

Create a table showing the calculated X² value (to one decimal place) for each of the ten relationships and the associated prob-value (to two decimal places). Place one asterisk near each relationship significant at the 0.15 level. Place two asterisks by each relationship meeting the 0.05 level commonly used in many social science projects.

Back to Your Original Analyses

So, re-evaluate your findings in Lab 3 with your statistical tests in Lab 4. Would you revise any of your findings in light of your statistically rigorous new analysis? If so, which ones? How do you feel about your ability to "eyeball" signficant relationships?


Table 1a             |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
Mercury-Register     |a       67   |b        70    |     137   
readers              |             |               |
                     |             |               |
------------------------------------------------------------      
Non Mercury-Register |c       36   |d        68    |     104
readers              |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       103   |        138    |   n=241


Table 1b             |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
Senior Lifestyle     |a       22   |b        17    |      39 
readers              |             |               |
                     |             |               | 
------------------------------------------------------------      
Non Senior-Lifestyle |c       81   |d       121    |     202
readers              |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       103   |        138    |   n=241


Table 2              |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
Mercury-Register     |a       53   |b        54    |     107   
subscribers          |             |               |
                     |             |               |
------------------------------------------------------------      
Non Mercury-Register |c       52   |d        78    |     130
subscribers          |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       105   |        132    |   n=237


Table 3              |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
gets cable TV        |a       47   |b        81    |     128   
                     |             |               |
                     |             |               |
------------------------------------------------------------      
does not get         |c       51   |d        53    |     104
cable TV             |             |               |
                     |             |               |
------------------------------------------------------------      
                     |        98   |        134    |   n=232


Table 6a             |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
shop at Raley's      |a       35   |b        60    |      95   
                     |             |               |
                     |             |               |
------------------------------------------------------------      
do not shop at     r |c      101   |d       105    |     206
Raley's              |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       136   |        165    |   n=301



Table 6b             |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
shop at Albertson's  |a       46   |b        42    |      88   
                     |             |               |
                     |             |               |
------------------------------------------------------------      
do not shop at     r |c       90   |d       123    |     213
Albertson's          |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       136   |        165    |   n=301



Table 7              |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
households with      |a       18   |b        35    |      53   
children < 18        |             |               |
                     |             |               |
------------------------------------------------------------      
households without   |c       85   |d       103    |     188
children < 18        |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       103   |        138    |   n=241


Table 8              |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
plans to buy a       |a       14   |b        20    |      34   
new or used vehicle  |             |               |
                     |             |               |
------------------------------------------------------------      
does not plan to buy |c       91   |d       116    |     207
a new or used        |             |               |
vehicle              |             |               |
------------------------------------------------------------      
                     |       105   |        136    |   n=241


Table 9              |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
home owners          |a       96   |b       116    |     212   
                     |             |               |
                     |             |               |
------------------------------------------------------------      
renters              |c        6   |d        16    |      22
                     |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       102   |        132    |   n=234


Table 10             |    NEWS-X   | NON-NEWS-X    |   TOTAL       
                     |         #   |          #    |       #
------------------------------------------------------------      
respondents aged     |a       19   |b        50    |      69   
<45                  |             |               |
                     |             |               |
------------------------------------------------------------      
respondents aged     |c       84   |d        83    |     167
45 or older          |             |               |
                     |             |               |
------------------------------------------------------------      
                     |       103   |        133    |   n=236

first placed on the web: 10/17/98

last revised: 10/17/98

GEOG 216-01

Locational Analysis