CALIFORNIA STATE UNIVERSTIY, LONG BEACH

GEOG 400
Geographical Analysis

Final Project: Student-Designed Project

==========

This lab has the following purposes:

  • to give you even more practice in using and interpreting multivariate statistical methods;
  • to make statistical analysis more meaningful to you as you work with your own data; and
  • to have you go through the process of figuring out which method(s) to apply and designing your own analysis on your own;
Project deliverables:
  • a well-written (and autographed) formal report describing your data sources, any shortcomings in them, which methods you considered applying, which ones you chose and why, and your findings;
  • a paper copy of your computations, autographed; and
  • any graphics that you feel help your argument, autographed

==============================

Background

==============================

This semester, you have learned about and practiced using a few multivariable/multivariate statistical techniques and there are several others available for your independent learning in the Grimm and Yarnold textbook, which SPSS supports. The early lab projects each focussed on one of these methods and I provided quite a bit of detailed procedural support to get you through them. The immediately preceding lab asked you to become more independent: You were given a database and asked to select and defend a multivariate method to analyze it and then conduct that analysis.

This time, you are going to be even more on your own. You are to track down and choose a database with at least five variables (not counting the record identifier!), aiming for at least five times as many records as you have variables. Ideally, there should be at least 100 records or so. I will allow you to use a database of US state scale data, but you should note in your report that this could make your results a little shaky because of the small record number.

Also ideally, the data set should be about something of interest to you, whether it's human geography, criminal justice, remote sensing, physical geography, geology, or archaeology (the mix of students in S/08!). You can find suitable data sets online or by asking your professors if they have a database that might work. If, after wracking your brains in search engines, you still can't find or create a data set that actually interests you, then work with any data set you can lay hands on. Those of you in the physical sciences might, therefore, find yourself working with election and Census data, while someone in the social sciences might find that a geochemical data pot might be the one they dive into. Remember: Statistics are statistics, no matter which discipline you're in. It's the decision-making process and the techniques that define the statistical mindset. I am hoping you find a data trove that really engages your interest (maybe you could turn your project into a conference presentation?), but, if you don't find something up your alley, get yourself a Plan B! Please consult with me if you are coming up with cold leads.

==============================

Preprocessing Your Data

==============================

You may have found your data in some inconvenient format. At this point, you need to do any preprocessing to get the data into a format that can be moved into SPSS. Aim for conversion into an Excel file. Once there, strip out any metadata and convert all variable names/column titles into 8 character titles. Now, import it into SPSS and it will hopefully fire up uneventfully.

==============================

Visualizing Your Data

==============================

At this point, create graphs of your data. Histograms of each variable would be useful to ensure that they are roughly normal. Scatterplots of pairs of variables might also be useful, so that you can see whether they form linear associations. If they appear curvilinear, you might want to do a mathematical transform on them (semi-log, log-log, whatever) and then redo the scatterplots. If that linearizes them, you might want to use the transformed variables in further analysis.

==============================

Hypotheses and Standards

==============================

You collected a particular data set presumably because you have an interest in it or have questions it might answer for you. At this point, state your research question or expectation and, if relevant, state it as a testable hypothesis (e.g., A, B, and C together create a model that accounts for a significant amount of the variation in D). Certain methods are more exploratory in character: You are simply trying to see the structure in a database. Here you would use a research question rather than a formal hypothesis.

You need to think through what your standard for "significance" is. You need to avoid the human tendency to see patterns in everything, including randomness (Type I error), yet you don't want to guard so stringently against this delusion that you actually miss something interesting (Type II error). This is what the selection of alpha is all about. The lower the alpha, the less chance there is of making a Type I error, but the higher a Type II error becomes (all things being equal, which they rarely are).

Sometimes two or more tests will do the job. If so, you try to select the most powerful test. The greater the power, the less likely a Type II error is, even with tight alphas. There are power calculators online, and most folks aim for about 0.80 or higher (1 - beta).

==============================

Selecting an Appropriate Approach

==============================

You need to figure out which multivariate method will help you make sense of your data. Perhaps you want to model the relationship among several variables and a dependent variable you want to understand. Maybe you just want to reduce a pile of variables into just a few so that you can explore associations in your database. Or??? You need to figure out your problem, then, and then select and justify your method and execute it.

Some techniques are very picky about their assumptions. That's why you should graph your data and associations among your data to ensure the minimum needs of the test are met. If they don't quite measure up and you decide to do the test anyway, make sure you explain the problem in your write up.

==============================

Interpretation and Write up

==============================

Briefly summarize in regular English how your results panned out. Your lab project report should provide an introduction to the topic your database selection was all about. This introduction should let us know what the subject of your project is and why it's important.

Go on to discuss the data themselves, what the variables are. Here you would also indicate any problems you have, such as the validity with which they address the topic you're interested in, any reliability issues (remember the Uniform Crime Report a few labs back?), and any shortcomings (such as non-normality of one or more variables, outliers, missing data for one or more records, whatever). Then, discuss the various methods you considered for crunching them and why you settled on one rather than another.

Then cover your results, followed by a discussion of what they mean for your hypotheses or expectations. You could end on a conclusion describing where you or someone else could go to carry your project forward in a New and Improved edition.

Here's a condensed outline of a common structure for scientific papers:

  • Introduction (background, importance, prior work, hypotheses)
  • Data and Methods (sources, shortcomings, procedures, limitations)
  • Results/Findings
  • Discussion (relating results back to the introduction)
  • Conclusions (summary or directions for future work based on problems in this project)

==============================
GEOG 400 Home   |   Dr. Rodrigue's Home   |   Geography Home   |   Scientific Calculator

==============================

This document is maintained by Dr. Rodrigue
First placed on Web: 02/05/08
Last Updated: 02/05/08

==============================