GEOG 400

Geographical Analysis

Project: Multivariate Statistical Research Design

==========

Introduction

==========

The previous labs have each introduced you to a different advanced statistical method, from the mathematical transformation of variables, through multiple regression, principle components analysis, K-means clustering, and detrended correspondence analysis. Rather than just introducing you to yet another technique, of which there are boodles, I decided instead to have you design a study from the ground up on your own.

Up until now, the labs have used a cookbook approach and kind of spoonfed you the step-by-step process of applying one of these methods. At this point, you have enough information and practice to figure out how to set up a study on your own, determining the kinds of variables you have, their distributions, whether there is a dependent variable you're trying to explain, and how to cope with an embarrassment of variable riches. You have a sense of when a formal research hypothesis is called for and the process of deciding whether or not it fails or succeeds.

Goals of this lab, then:

  • have you explore four databases
  • design a study using one of these databases
  • give you even more practice using SPSS and/or PAST
  • interpret the outcome of your study in a logical manner
  • write up your results and analysis professionally

Project deliverables are:

  • SPSS output and/or PAST output in a spreadsheet, signed
  • a signed essay, in which you introduce the data and problem, formulate hypotheses if appropriate, the method(s) you have decided to apply and why you chose it (them), any possible problems with the application of the method(s) to these data and how you dealt with them, your results, and a discussion of their meaning. These essays can be done in 2-3 pages, single-spaced.
  • any appropriate graphs or tables needed to support your analysis, signed.

==============================

Getting the Data

==========

Download the following four databases:

You'll design a study based on these data from the ground up. To do that, become familiar with them by opening them up to have a look at them all. Archive the one or two you are most interested in and then convert them into an Excel 2000/97/XP file (.xls). Use the Excel file to get them into SPSS or PAST. Results of analyses in the statistical packages can be brought back into either version of the spreadsheet, though OpenOffice Calc is easier for any graphs you'd like to do.

Start playing around with them in SPSS and/or PAST, trying to see which of the methods you've learned about might help you make sense of it. With at least one of the databases, you could come up with sensible results using two different methods. At least one of the others would allow you to combine methods coherently. Futzing around with them really is the way to make multivariate statistics fun and get you in the right frame of mind to take on your own database and make those numbers sing!

Once you have familiarized yourself with these databases and poked around in them in SPSS or PAST, pick one of particular topical or methodological interest to you or which you feel most comfortable analyzing.

Think through the issues that have come up again and again this semester:

  • What is the question you're interested in using the database to answer? Can you formulate hypotheses/null hypotheses?
  • If you can formulate hypotheses, what is the alpha standard you will use to decide whether your results are significant or not? Think that Type I and Type II error thing through. Think it through even if you are not in a position to test a formal hypothesis but are only interested in simplifying or classifying records.
  • What kind of data are most or all of your variables (scalar? ordinal? categorical?)? Some methods require scalar data (e.g., PCA); others are less fussy (e.g., DCA), so make sure the data conform with the requirements of the test.
  • Are the variables roughly normally distributed? You've seen how an outlier can make for nonsense. You can try trimming out any eccentric records and see if that moves the variable toward normalcy. If you do, make sure to explain what you did and why.
  • Can you clearly see a dependent variable (Y) in there or is this more of a data-mining situation? Is the Y variable scalar and normal? Regression is fairly robust against non-scalar X variables but not against non-scalar Y variables.
  • If you're trying to group or classify records and you're overwhelmed by the number of variables, you might try a PCA or DCA to reduce the database.
  • Are there any problems or shortcomings with the database that you should let the reader know may compromise your results? Not every database is perfect in the real world; we use them anyhow, but honestly tell the reader that there may be a problem.

==============================
GEOG 400 Home   |   Dr. Rodrigue's Home   |   Geography Home   |   ES&P Home   |   EMER Home   |   Scientific Calculator   |   PAST  |   QGIS

==============================

This document is maintained by Dr. Rodrigue
First placed on Web: 04/17/06
Last Updated: 11/15/16

==============================