CALIFORNIA STATE UNIVERSTIY, LONG BEACH

GEOG 400
Geographical Analysis

Project: Multiple Regression
Once More, with SPSS Vigor

==========

Introduction

==========

The purpose of this lab is to have you revisit multiple regression, but this time in SPSS, a commercial dedicated statistics package. You will be able to tackle more ambitious model-building this time, since SPSS will take care of the nuts 'n' bolts for you.

Goals of this lab, then:

  • have you do simple linear regressions in SPSS to get the hang of how SPSS works
  • (re)acquaint you with multiple regression
  • give you practice using SPSS
  • have you try building multiple regression models using both backwards elimination and forward inclusion approaches
  • give you practice in dealing with politically loaded and complex socio-spatial problems (which may affect your future careers, no matter which fields you pursue)!

Project deliverables are:

  • lab answer sheet, printed, filled out, and, oh, yeah, autographed
  • SPSS graphs for your many simple regression models, autographed
  • SPSS correlation matrix, autographed

==============================

Getting the Data

==========

Your data set this time consists of 50 records (the 50 states, not including Washington, DC) and seven numerical variables. They are, again, available as a Calc spreadsheet: https://home.csulb.edu/~rodrigue/geog400/sat.ods. As usual, click to download the file to your flash drive or wherever you've decided to park the file. Open it in OpenOffice to have a look at it and then immediately save it (you can't do anything with the file until it's been saved somewhere). You can find a sheet of "metadata" (data about the data) in a separate tab, which translate the very short variable names. Because SPSS can be fussy about variable names, I have given the variables very short (< 8 characters) names with no non-alphanumeric characters in them. So, they may not be the most obvious things in the world. Accordingly, I have provided a data dictionary in the metadata sheet, but I'll explain the variables and the situation here, too.

==============================

About the Data and the Problem They Address

==========

The data set was compiled and made available by Deborah Lynn Guber for an article entitled, "Getting what you pay for: The debate over equity in public school expenditures," which appeared in the Journal of Statistics Education 7, 2, back in 1999. She assembled the data into one database from the 1997 Digest of Education Statistics, a public-domain collection of information published by the National Center for Education Statistics. These particular data are from the 1994-95 academic year. I had to convert variable names to make them acceptable to SPSS for you, but I've made no other changes.

The data are presented in eight columns.

  1. Column A is "State" -- 'nuff said.
  2. Column B is "expstd," which means the average 1994-95 expenditure per student per state in public elementary and secondary schools, given in thousands of dollars.
  3. Column C is "stdfac" for average student to faculty ratios in each state, as of Fall 1994.
  4. Column D is "teachpay," which is the estimated average annual salary of teachers in 1994-95, given in thousands of dollars.
  5. Column E is "takesat," for the percentage of all eligible students in each state, who took the SAT exams in 1994-95.
  6. Column F is "verbsat," for average verbal score on the SATs in 1994-95 in each state.
  7. Column G is "mathsat," for average math score on the SATs in the various then.
  8. Column H is "totalsat," for, you guessed it, average total SAT scores for the states that year.
SPSS can import data from a variety of sources, but Calc .ods files are not one of them. So, let's use Calc to save the file as an Excel file. Select File -- Save As -- and then open the Save as type box with the down arrow on its right side and then scroll down to Microsoft Excel 97/2000/XP (.xls). Microsoft Excel 95 works, too. When you hit Save, it'll keep the same file name but change the format and you should see sat.xls on the blue title bar at the top. OpenOffice can happily continue working on the file, but we no longer require its services, as we're moving into SPSS. So, save and close the file (SPSS won't open a file open in another program) for now. At least you now know how to convert file formats from one program to another, and that is a good skill to have.

Now, about the data in that spreadsheet: These data concern a long-standing debate over educational reform to remedy the spatial inequality of school expenditures. The more money a school district or a state has to spend on textbooks, libraries, computer facilities, and teacher salaries, the better its children should do when it comes to the competition to get into college. Affluent communities, then, have the capacity and the willingness to pay higher taxes to finance schools to give their kids an edge in college admission. Poorer communities, not unreasonably, feel that this is inequitable and some of the surplus money in richer communities should be used to improve the quality of their schools to give their children a chance to escape poverty through higher education.

This is, obviously, a hot ideological debate. Might some data throw water on the fire? Does money really matter? Do data really matter to politicians?

Performance on SATs make an obvious Y variable. We have an embarrassment of riches here: Three different measures of performance on the SATs (verbal, math, and overall).

We also have a bunch of possible X variables: Average expenditures per student; more specifically, average teacher pay as a big component of expenditures; and average student to teacher ratios (another measure of resources being invested in students). Any or all of these might be used to explain variation in SATs. There are bound to be some relationships among these X variables (multicollinearity).

We also have another variable, which could be an X variable or a Y variable: Percentage of students in a state who actually take the SATs, which are important prerequisites for application to many colleges. Does the percentage of students taking the SAT somehow relate to the state's average performance on the SATs? Does expenditure on students affect the percentage of kids who get motivated enough to attend college that they'll take the SATs?

==============================

Stating Your Expectations

==========

I'm not going to have you do all three potential Y variables. In class, we'll decide on just one of them for you to do. Which one did you get?


______________________________________________________________________________
So, what do you suppose is the relationship of each of the possible X variables on your Y? Please state your expectations about the effect of each X on your Y (direct or inverse association) and a brief common-sense rationale for your expectation:

X1 (expstd) on Y:_____________________________________________________________

______________________________________________________________________________

X2 (stdfac) on Y:_____________________________________________________________

______________________________________________________________________________

X3 (teachpay) on Y: __________________________________________________________

______________________________________________________________________________

X4 (takesat) on Y:____________________________________________________________

______________________________________________________________________________


And, since we're on a roll with hypotheses, what you suppose the relationship is between each of the first three X variables and takesat as a Y?

X1 (expstd) on Y2:____________________________________________________________

______________________________________________________________________________

X2 (stdfac) on Y2:____________________________________________________________

______________________________________________________________________________

X3 (teachpay) on Y2:__________________________________________________________

______________________________________________________________________________


To be consistent with SPSS' default settings and because it is so common, we'll use the alpha=0.05 standard in evaluating these seven hypotheses. It balances against the generally more serious Type I error of thinking we see something that really is random chatter, while yet retaining enough power to avoid the Type II error of not seeing something that IS there. I'm assigning it now, but you should always think about this. Most people use the 0.05, but there are times when a "kinder and gentler" standard is more appropriate, as when you are doing a more exploratory or a pilot study and it's more important to uncover promising lines of inquiry. Other times, 0.05 is way too slack, as in decisions that could hurt people seriously (e.g., safety of the blood supply, a guilty verdict in a capital trial). You need to think your way through it every study you do.

Another issue to think through is whether you really have enough theoretical understanding of an association to be sure that the direction of difference or, here, association that you hypothesize is the direction that comes out. If you have solid theoretical justification for picking a direction, you can use a one-tailed test, putting all the alpha on one tail of the normal distribution. Generally, it's a better idea not to specify a directional test if there's any possibility that you don't really understand the system and select a two-tailed test, where half the alpha is put on either end of the distribution to define your region of rejection. Since each of you is bound to have different ideas about the directions of these associations and be able to argue for them on some kind of common-sense or ideological framework, let's opt for two-tailed tests and split the difference.

==============================

Doing the First Simple Linear Regression in SPSS

==========

Fire up SPSS. When the opening box comes up, click on the "Open an existing data source" button and hit "Okay."

Now, "Open File." Browse to the drive you saved sat.xls onto. Once you're in the right drive, pick .xls for the type, which should bring up all the xls files you may have on your diskette. Double-click on sat.xls.

On the dialogue box that comes up, make sure that "Read variable names from the first row of data" is checked. Leave "Worksheet" alone. For "Range," enter A1:H51 (or just leave that blank, since there's nothing else in the sat sheet but the SAT data). Hit "OK," and your spreadsheet should plop up on the SPSS data editor box and there should also be a second file opened for model Output. Take a moment and save both files in native SPSS format. The data editor/spreadsheet/database part will be saved as sat.sav, and the output part should be saved as sat.spv.

Now, click on the "Analyze" menu and choose "Regression." On the regression menu, choose "Linear." Up comes a large dialogue box. The directory on the left lists all variables in the data editor spreadsheet. On the right are boxes where you put your Y (dependent) and your X (independent) variables.

First, highlight your Y variable (for SAT scores, whichever one you were assigned). Now, click on the right arrow and, wham!, it's plopped in the Y variable box.

Second, highlight your first independent variable: expstd. Click on the right arrow and move it into the X variable box (the predictor or independent variable).

Now, make sure "Enter" is the "Method" visible just below. Hit "OK" at this point. Poof! Up comes the "Output" box. On the left frame will be "Regression" and the list of tables of output pertaining to this regression model. On the right frame will be the actual output tables themselves. I do not expect you to print these (unless you wish to treasure them forever). What I want you to do at this point is fill in the model you built, its associated correlation and determination coëfficients, and assorted measures of significance.

It's a good idea at this point to label your output by double-clicking the word, Regression, in the left pane and then adding something like "expstd." You might want to do the same to the Regression word on the right pane. It is really hard to get lost in piles of output and not know which section pertains to which analysis (advice from someone who has suffered just this fate).

From the box, "Model Summary," what is the correlation coëfficient for your model? R = __________

What's the raw coëfficient of determination? R2 = __________

What about the adjusted coëfficient of determination? Radj2 = __________

To get the constants (a and b) for the model you just built, look to the output box farther down labeled "Coëfficients." In that box, look for "B" under "Unstandardized coëfficients." The first one, beside "(Constant)" is a (remember, some people call a, b0, which is why SPSS puts it under B). The one below, beside "expstd," is b, or the regression coëfficient for your X. So, fill in your model (and be sure to include the sign for b, if it's negative):

Y = __________ + ( __________)X

What is the t value for expstd? t = __________

What is the significance for that value of t for expstd? Prob= __________

Is that significance value smaller than the critical alpha of 0.05? __________

Now for some real fun -- graphing your data and the model you just built. SPSS is a little bit clumsy here, so bear with me. Go back up to the menu at the top and click on "Graphs." On the menu that drops down, pick "Legacy Dialogs" and then "Scatter/Dot...." On the Scatterplot dialogue box that comes up, click on "Simple" and then hit "Define." Move your SAT variable to the Y Axis and expstd to the X axis. Click the Titles box and put in something like Simple Linear Regression: Expst and Totalsat and hit Continue. Then, in the original dialogue box, pick OK. A scatterplot appears in the output box and "Graph" and some other stuff appears on the left frame of the output box. This is a raw scatterplot.

To graph the regression line on it, you need to double-click on the middle of the graph itself. Another, smaller box comes up, with a copy of your graph in it. This box has its own menu. Choose "Elements." On the menu that drops down, choose "Fit Line at Total" Then, on the "Properties" dialogue box, check "Linear" under the "Fit Line" tab and then hit "Apply." That should put your regression line of expected Y values right on your scatterplot. You can even mess around with color and line style and thickness in the Lines tab before finishing. Now, close the graph editor. Instantly, your newly fit line appears on the larger graph on the original output box. Purdy, ain't it?

Okay, this graph AND the b coëfficient and the t value should be starting to bug you about now. What is wrong with this picture? In other words, what was your original expectation, and what did you get? Who do you think would be ecstatic to see these results and why?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

==============================

Finishing the Seven Simple Linear Regressions

==========

Now that you have the hang of simple linear regression in SPSS, do the exact same process above for all four independent variables as each influences performance on the SATs (whichever one you were assigned to do). Please enter your results in the table below (well, the one on the answer form, actually).

"Significant?" is answered by "yes" or "no" -- that is, is the sig or prob-value associated with t less than 0.05?

"Direction" is answered by "direct" for an association that turned out positive (trending to the upper right) and "inverse" for an association that turned out negative (trending to the lower right).

"Expected?" means the direction you predicted with your statement of expectation for that hypothesis (your working hypothesis)?

Y = sat (totalsat, verbsat, or mathsat)
Xi      | R  | R2adj|     model        | t   | sig:t | signif?| directn| expect?|
________|____|______|__________________|_____|_______|________|________|________|
        |    |      |                  |     |       |        |        |        | 
expstd__|____|______| Y=_____+(_____)X |_____|_______|________|________|________|
        |    |      |                  |     |       |        |        |        |
stdfac__|____|______| Y=_____+(_____)X |_____|_______|________|________|________|
        |    |      |                  |     |       |        |        |        |
teachpay|____|______| Y=_____+(_____)X |_____|_______|________|________|________|
        |    |      |                  |     |       |        |        |        |
takesat_|____|______| Y=_____+(_____)X |_____|_______|________|________|________|
Now, do the same thing for the remaining three regression models, after switching your Y variable to takesat. This time, you need to click on the current dependent or Y variable (your SAT variable), which makes the arrow point left back to the variable list. Click on the left arrow to return *sat to the general list, leaving you with three independent (Xi variables. Highlight takesat now, which should create a right arrow, so you can move it to the status of dependent (Y) variable.
Y = takesat
Xi       |  R  | R2adj|         model         |  t  |  sig(t) | significant? | direction | expected? |
_________|_____|______|_______________________|_____|_________|______________|___________|___________|
         |     |      |                       |     |         |              |           |           | 
expstd___|_____|______| Y = _____ + ( _____)X |_____|_________|______________|___________|___________|
         |     |      |                       |     |         |              |           |           |
stdfac___|_____|______| Y = _____ + ( _____)X |_____|_________|______________|___________|___________|
         |     |      |                       |     |         |              |           |           |
teachpay_|_____|______| Y = _____ + ( _____)X |_____|_________|______________|___________|___________|


==============================

Multiple Regression Modeling with the Kitchen Sink Approach

==========

Let's first try the "kitchen sink" approach on our way to understanding our squirrely results. Go back to the Analyze function and choose Regression and then Linear. This brings back the list of variables in the spreadsheet and the opportunity to define Xs and Y. First, highlight any variable name still in the dependent (Y) and independent (X) boxes. Move them back to the general list on the left.

Highlight the sat variable you were assigned (totalsat, mathsat, or verbsat). Move it into the dependent (Y) variable box.

Now, highlight expstd, stdfac, teachpay, and takesat and then click the arrow that will move the whole kit and kaboodle into the independent (X) variable box. Make sure that "Enter" is the "Method." Hit "Okay."

You'll get an output that resembles what you had for the simple linear regressions, except the top and bottom boxes will be deeper. It's a good idea to label the output on both the left and right panes (e.g., Regression: Kitchen sink), so you can find it later.

Fill out this table.

Y = sat (totalsat, mathsat, or verbsat)

|  R  | R2adj |  F  |sig(F)|         model          
|_____|_______|_____|______|_________________________________________________________________| 
|     |       |     |      |                                                                 |
|_____|_______|_____|______|  Y = _____ + ( _____)X1 + ( _____)X2 + ( _____)X3 +  ( _____)X4 |


                      
Xi         t  |  sig(t) | significant? |
______________|_________|______________|
              |         |              | 
expstd________|_________|______________|
              |         |              |
stdfac________|_________|______________|
              |         |              |
teachpay______|_________|______________|
              |         |              |
takesat_______|_________|______________| 


Is the overall model significant (the F significance in the ANOVA output block, which evaluates the whole model)? __________

Are any of the individual X variables insignificant contributors to the model and possible candidates for removal (prob-values or "Sig" higher than 0.05 in the Coëfficients output box)? In other words, are any of these candidates for removal either for being poor contributors or for "donating at the office" (conveying their signal through some other X variable they're correlated with, which preëmpts their signal)?

______________________________________________________________________________

______________________________________________________________________________


==============================

Multiple Regression Modeling with the Backwards Elimination Approach

==========

Let's see if we can find a better model. You have an idea about variables that could be painlessly dumped. Let's let SPSS try, using the criterion that it keeps dropping variables until further chopping would significantly hurt F. It will do this automatically for you!

Go back to Analyze, Regression, Linear. This time, leave the Y and the X variables as is, but change the "Method" to "Backwards." SPSS will build models with fewer and fewer variables until it is in danger of significantly hurting F. This time, when you hit "OK," you're going to get a lot more output. SPSS will label each model it tries by a number. Model 1 is the kitchen sink model you already did. Model 2 has one fewer variable and so on.

How many models did SPSS build by going backwards? __________

Fill out this table for the model with the fewest variables.

Y = sat (totalsat, mathsat, or verbsat)

|  R  | R2adj |  F  |sig(F)|         model          
|_____|_______|_____|______|__________________________________________________| 
|     |       |     |      |                                                  |
|_____|_______|_____|______| Y = _____ + ( _____)X1 + ( _____)X2 + ( _____)X3 |
|                                                                             |
|name of X variable:                     __________   __________   __________ |


Variables in final model (from Coëfficients and Excluded Variables)
                      
Xi         t  |  sig(t) | significant? | excluded     | included     |
______________|_________|______________|______________|______________|
              |         |              |              |              | 
expstd________|_________|______________|______________|______________|
              |         |              |              |              |
stdfac________|_________|______________|______________|______________|
              |         |              |              |              |
teachpay______|_________|______________|______________|______________|
              |         |              |              |              | 
takesat_______|_________|______________|______________|______________|

How many variables did you think SPSS would remove on the basis of the variables having prob-values in excess of 0.05 way back when you were studying the kitchen sink model? __________

How many DID it remove? __________

Speculate a bit on why SPSS didn't remove all the variables you thought were candidates for removal.


______________________________________________________________________________

______________________________________________________________________________

==============================

Multiple Regression Modeling with the Forward Approach

==========

Let's see what happens if we try model building going the other way. Let's use the forward approach, which SPSS offers.

Forward modeling involves not just testing the change in adjusted R squared and in the F statistic to see if the process should continue. It also looks at the impact of inclusion or exclusion of a variable on the variables already entered into the model. It then drops those that develop enlarged and insignficant prob-values as a result of the last change in the model. So variables entered at one point in the process may be removed later. It's possible a removed variable will come back in if removal of some OTHER X variable removes an important signal that the originally removed variable will now be allowed to carry. Variables in the model are provisional until the final model.

Back to Analyze, Regression, Linear, folks. Continue to leave the X and Y variables as is. This time, however, select "Forward" for "Method" and hit "OK." Again, a large output is delivered.

Now, how many models did SPSS wind up building? __________

How many X variables did SPSS leave in the final model? __________

Fill in the following table:

Y = sat (totalsat, mathsat, or verbsat)

|  R  | R2adj |  F  |sig(F)|         model                                  
|
|_____|_______|_____|______|__________________________________________________|   
|     |       |     |      |                                                  |
|_____|_______|_____|______| Y = _____ + ( _____)X1 + ( _____)X2 + ( _____)X3 |
|                                                                             |
|Name of X variable:                    __________   __________   __________  |
                      

Variables in final model (from Coëfficients and Excluded Variables)

Xi       | t    |  sig(t) | significant? | included? | excluded? |
_________|______|_________|______________|___________|___________|
         |      |         |              |           |           |
expstd __|______|_________|______________|___________|___________|
         |      |         |              |           |           |
stdfac __|______|_________|______________|___________|___________|
         |      |         |              |           |           |
teachpay_|______|_________|______________|___________|___________|
         |      |         |              |           |           |
takesat _|______|_________|______________|___________|___________|


==============================

Kitchen Sink for Percentage of Students Taking SATs

==========

Let's redo this whole process for the other possible Y variable, takesat. Go back to the Analyze function and choose Regression and then Linear. This brings back the list of variables in the spreadsheet and the opportunity to define Xs and Y. First, highlight any variable name still in the dependent (Y) and independent (X) boxes. Move them back to the general list on the left.

Highlight the takesat variable and move it into the dependent (Y) variable box.

Now, highlight expstd, stdfac, and teachpay, and then click the arrow that will move the three of them into the independent (X) variable box. Leave *sat back in the left box as we won't use it here. Make sure that "Enter" is the "Method." Hit "Okay."

Fill out this table, using the first model, which is the kitchen sink.

Y = takesat

|  R  | R2adj |  F  |sig(F)|         model          
|_____|_______|_____|______|__________________________________________________| 
|     |       |     |      |                                                  |
|_____|_______|_____|______| Y = _____ + ( _____)X1 + ( _____)X2 + ( _____)X3 |


                      
Xi         t  |  sig(t) | significant? |
______________|_________|______________|
              |         |              | 
expstd________|_________|______________|
              |         |              |
stdfac________|_________|______________|
              |         |              |
teachpay______|_________|______________|


Is the overall model significant? __________

Are any of the X variables insignificant, lousy contributors to the model and possible candidates for removal (prob-values higher than 0.05)?


______________________________________________________________________________

______________________________________________________________________________

==============================

Backwards Elimination and Kids Taking SATs

==========

Go back to Analyze, Regression, Linear. This time, leave the Y and the X variables as is, but change the "Method" to "Backwards."

How many models did SPSS build by going backwards? __________

Fill out this table for that model with the fewest variables.

Y = takesat 

|  R  | R2adj |  F  |sig(F)|         model          
|_____|_______|_____|______|_____________________________________| 
|     |       |     |      |                                     |
|_____|_______|_____|______| Y = _____ + ( _____)X1 + ( _____)X2 |
|                                                                |
|name of X variable:                    __________   __________  |


Variables in final model (from Coëfficients and Excluded Variables)
                      
Xi         t  |  sig(t) | significant? | included? | excluded? |
______________|_________|______________|___________|___________|
              |         |              |           |           | 
expstd________|_________|______________|___________|___________|
              |         |              |           |           |
stdfac________|_________|______________|___________|___________|
              |         |              |           |           |
teachpay______|_________|______________|___________|___________|


==============================

Forward Regression and Kids Taking College Entrance Exams

==========

Let's see what happens if we try model building going the other way. Let's use the forward approach. Back to Analyze, Regression, Linear, folks. Continue to leave the X and Y variables as is. This time, however, select "Forward" for "Method" and hit "Okay." Again, a large output is delivered.

Now, how many models did SPSS wind up building? __________

How many X variables did SPSS leave in the final model? __________

Fill in the following table:

Y = takesat

|  R  | R2adj |  F  |sig(F)|         model                       
|
|_____|_______|_____|______|_______________________|______________| 
|     |       |     |      |                       |              |
|_____|_______|_____|______| Y = _____ + ( _____)X1| + ( _____)X2 |
|                                                  |              |
|Name of X variable:                    __________ |   __________ |
                      

Variables in final model (from Coëfficients and Excluded Variables)

Xi         t  |  sig(t) | significant? | included? | excluded? |
______________|_________|______________|___________|___________|
              |         |              |           |           | 
expstd________|_________|______________|___________|___________|
              |         |              |           |           |
stdfac________|_________|______________|___________|___________|
              |         |              |           |           |
teachpay______|_________|______________|___________|___________|


==============================

Correlation Matrix

==========

Before getting to your interpretation, you might want to have a single reference showing how all the variables associate with one another. Let's build a correlation matrix of expstd, stdfac, teachpay, takesat, and your assigned measure of SAT performance. Doing a correlation matrix is generally done along with multiple regression analyses, so that you can spot multicollinearity and unsuspected associations that might be affecting your results.

To do this, go to Analyze but this time select Correlate instead of Regression. Under Correlate, pick Bivariate. Up comes the familiar box that allows you to move variables into and out of the analysis. This time, though, there is only one destination box, because correlation doesn't care about dependence/independence or Y/X directions of causality. Highlight each of the variables above (expstd, stdfac, teachpay, takesat, and your assigned measure of SAT performance) from the menu on the left and move it to the analysis Variables box on the right. Make sure the Pearson Correlation Coëfficient is the one checked off. Also, be sure that Test of Significance reads Two-tailed. Let's accept the default for Flag Significant Associations, too, while we're at it. Hit OK. That's all there is to it.

Up comes a matrix showing each variable down the left axis and across the top axis. The cells of the matrix show you the Pearson Correlation, Significance, and sample size for each pair of possible associations. If they are significant in two-tailed tests at the 0.05 and 0.01 level, they'll be marked with 1 and 2 asterisks, respectively. You want to print this to refer to later and to turn in as one of your deliverables. It will look best if you go to File and select Page Setup and pick Landscape orientation. Go ahead and print it then.

==============================

Interpreting Student Performance on SATs

==========

Now, try to make sense of your results. You've built an entire raft of models, both simple linear ones and multiple regression models, some of which produced surprising results. You also produced a correlation matrix of every possible pair of variables. First, consider the many models you built to account for variation in students' SAT performance: four simple linear regressions and three multiple regressions (kitchen sink, backwards elimination, and forward inclusion).

Why would there be a significant negative relationship between the percentage of students taking the SAT and the average performance on the SAT?

______________________________________________________________________________

______________________________________________________________________________

In light of this relationship, why do you see significant negative relationships between expenditures per student and student performance on the SATs? Between teacher pay and student performance on the SAT?

______________________________________________________________________________

______________________________________________________________________________



Why do you suppose there is no significant relationship between student:faculty ratio and almost anything else? Hint: look at the strongest association that student:faculty ratio has. Why would there be a negative association between teacher pay and student:faculty ratio? How might that muddle other things?
______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

Of the three multiple regression models you built to explain variation in student performance on the SATs, which one do you feel makes the most sense theoretically? Things to consider in framing your answer can include
  • what you've said in interpreting student performance so far
  • adjusted R square values for each model
  • significance of all variables in the model
  • simplicity of model
  • ease of putting the model into English and trying to derive policy from it.

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

==============================

Interpreting Percentage of Students Taking SATs

==========

What is the association of expenditures per students, student:faculty ratio, and teacher pay with the percentage of students taking the SATs and thereby expressing their interest in going to college?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________


Why is it that expenditures per student drops out of the multiple regression models built in the forward stepwise manner and in the backwards elimination approach? Hint: Look at the correlation matrix.
______________________________________________________________________________

______________________________________________________________________________

Which of the three multiple regression models makes the most sense to you? Your choice is a little tougher this time, isn't it? It might help to look at the criteria SPSS uses to add variables or remove variables (first box, "Variables Entered/Removed").

Again, consider:

  • what you've said in interpreting percentage of students taking the SAT so far
  • adjusted R square values for each model
  • significance of all variables in the model
  • simplicity of model
  • ease of putting the model into English and trying to derive policy from it.
______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________


==============================

Science and Public Policy

==========

As a geographical scientist, you've done the best you could with these data. You've reported everything fully. Now, an assortment of self-interested groups will review your results, sifting through them to find something useful in lobbying policy-makers directly or in propagandizing the public to generate citizen pressure on those policy-makers. Let's switch hats now and see your results from these other perspectives.

Let's say you are a conservative Republican influential in state politics. As such, you would like to see the rôle of government limited, in most cases, to law and order functions. You view government spending on social programs with a certain jaundice. So, wearing your Republican hat, which of these results would be most appealing to you in designing issue ads with soft money to reduce government spending? Design a sound-bite that encapsulates your cherry-picked finding.


______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

Now, let's have you put on a completely different hat: You are now working for a teachers' union as contract opening approaches. Which findings would you find most suitable for arming your lobbyists and the rank and file teachers? Come up with a slogan suitable for a picket sign that captures the essence of the findings most favorable to your interests.
______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________


And, now, as a concerned parent and taxpayer about to be hammered with propaganda, what do you think might be in the best interest of the kids themselves, your kids? Which model(s) shed the most light on this? Now, don't you wish you had the time to look into election issues this thoroughly on your own rather than rely on ads, groups you kind of trust, and news coverage?
______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

==============================
GEOG 400 Home | GEOG 400 Syllabus | Dr. Rodrigue's Home | Geography Home | ES&P Home | EMER Home | Scientific Calculator | Guber 1999

==============================

This document is maintained by Dr. Rodrigue
First placed on Web: 04/14/01
Last Updated: 02/12/13

==============================