APXS Data from All Four Rovers (as of November 2016)
This lab has the following objectives:
-
To share with you a consolidated database I put together of all APXS data from
all four rovers as of November 2016, which I then geocoded
-
To have you process the data into intelligible form using K-means clustering
in the PAST multivariate statistical program (free!) and Open/LibreOffice Calc
(open-source and free!).
-
Moi, I took the "scenic route" through them statistically, probably
using much too elaborate a process (e.g., 15 clusters!). The data
settled down into about four to five "metaclusters," which seem to emphasize
the following processes:
- meteors
- Mars-typical unevolved basalts and basalt-derived soils
- materials that may represent fractionated magmas (more felsic than Mars-
typical basalts)
- rocks and soils that may have been altered by neutral or neutral-alkaline
water (elevated halogens)
- rocks and soils that may have been altered by acidic, sulfur-rich water
-
I would like you to do a simpler, four cluster version of this analysis to see
if the four "metaclusters" show up as four intelligible K-means clusters
Background
The data came from the Planetary
Data System (PDS) Geosciences Node in the form of several files in a
variety of formats. I put them into a common format in OpenOffice Calc and
geocoded them manually. The file includes targets' informal names, formal
names, the mission they were from, the mission-relative sol a target was
taken, latitude, longitude, elevation, target type (rock, soil, disturbance),
followed by the raw APXS data as weight percentages or parts per million for
each of sixteen common oxides and elements. They are: sodium oxide, magnesium
oxide, aluminum oxide, silica, phosphorous pentoxide, sulfur trioxide,
chlorine, potassium oxide, calcium oxide, titanium oxide, chromium (III)
oxide, manganese oxide, ferrous oxide, nickel, zinc, and bromium.
The Alpha Particle X-ray Spectrometer (or, on Pathfinder/Sojourner, the Alpha
Proton X-ray Spectrometer) is an instrument that an arm on the rover places
into contact with a target. It then exposes the target to alpha particles
(basically, just the nuclei of helium atoms, that is, a helium ion) and X-rays
from a radioactive source in the instrument (curium-244). Some of these are
reflected from atomic nuclei, backscattered pretty much unaltered into the
spectrometer. Others impact electrons from atoms' inner shells, evicting them
as beta particles and ionizing the atoms. Electrons in higher shells drop down
to replace them but they have to pay for the real-estate upgrade by emitting
characteristic X-ray photons that scatter into the APXS. Still other
electrons will absorb the energy from the APXS and then re-emit it as X-rays
with distinctively longer wavelenths than the outgoing X-rays. All this
reflection, displacement, absorption, and re-emission activity generates a spectrum of
X-rays back into the APXS and the peaks and lows along that spectrum can be
matched to the known spectral distributions of key mineral-forming oxides and
elements.
Here is an example of such a spectrum for the Jake Matijevic rock, the first
APXS target of the Curiosity rover, with the diagnostic peaks for the various
oxides and elements thoughtfully labelled!
Acquiring and pre-processing your data
First, fire up OpenOffice from the Desktop. Now, download the consolidated
database from https://home.csulb.edu/~rodrigue/geog441541/labs/K-means/4rovers441541.ods.
Save it (or note where your browser plopped it) and then navigate to it with
OpenOffice Calc.
There are the raw data described above. Before doing any analysis, you need
to get it into standardized format, so that no oxide or element is
disproportionately weighted simply because of the units in which it's measured
by APXS (ppm vs. wt%). The idea here is to focus on each target as it
departs from the all-Mars average conditions for each oxide and
element. So, at the bottom of the spreadsheet, let's say in row 904,
calculate the mean for each chemical -- =average(H2:H902), H being
whichever column letter you're processing. Right below that, calculate the
standard deviation for each oxide and element -- =stdev(H2:H902). Now,
highlight the two cells containing the mean and the standard deviation for
sodium oxide. Move the cursor to the lower right of the highlighted area
(it'll turn from an arrow to a crosshair). Right-click and drag the formulas
all the way to the rightmost column containing data (bromium, in Column W).
Set up as many column headers to the right of your raw data and label them "Na2O t" and so on (in a fit of generosity, I just put in the headers to save you
the tedium). These new columns will house the t-scores, tha is, the
standardized scores for each of the original raw scores, and it is these
t-scores that will go into the clustering phase of the analysis. Each target's
weight in the analysis will now represent how typical or how squirrely it is
in comparison with that chemical's norm across all Mars targets.
First, note the column letter for the column housing the original "Na2O wt%"
data: "H." Now, in the second row of the new column on the right headed "Na2O t" (that would be "X"), type =(H2-H$904)/H$905. That is, the reading
for Na2O for your first target minus the martian mean for that chemical
divided by the standard deviation for that chemical. That's the t-score. It
is important to put a dollar sign before the 904 and 905 in this
formula, in order to lock the spreadsheet's calculations on the rows
containing the means and standard deviations when you copy the formula down.
Now, right-click and drag that cell all the way down the column through the
902nd row -- voilà, the whole column turned into t-scores. Now,
highlight all the cells from X2 through X902 and then move the cursor to the
bottom right corner of the last entry, right-click and drag all the way to the
last column on the right (Br t in Column AM). All your chemical readings
standardized. Now, no one oxide or element has disproportionate weight due
either to the units it was measured in or due to differences in variability
among the chemicals.
Hit Format and then Cells and then format the columns of numbers to a
consistent number of decimal places, such as 4 or 5, or whatever.
To check that you got all this right, go back to the last mean and
standard deviation (Br ppm). Hightlight those two cells, right-click
and drag across to the Br t column. The means should all be 0 and the
standard deviations should all be 1 (they've been standardized).
Save the file!
Analytical Technique
We'll use PAST (PAlaeontological STatistics) to crunch the data. PAST is a
freely available multivariate statistics package originally developed for
palaeonological analysis that has become pretty popular in ever more
disciplines, such as archaeology, biogeography, and ecology. It offers many
of the multivariate techniques found in commercial statistics software, such
as SPSS, Minitab, SAS, and Statistica, but it also includes some techniques
not found in those (e.g., detrended correspondence analysis and
biodiversity measures), as well as a more dependable version of k-means
clustering, the technique we'll apply here. PAST was developed by
Øyvind Hammer at the Natural History Museum at the University of Oslo
David A.T. Harper at the Geological Museum in Copenhagen, and Paul D. Ryan at
the National Museum of Ireland and made available for others to use for free.
It's not open-source, but it is freely available, so be sure to cite the
authors if you make use of it.
The technique we'll use is K-means clustering. It is a multivariate
technique used in pattern recognition and data-mining, more than in formal
hypothesis testing. It is a common approach to unsupervised image
classification in remote sensing (it may ring bells if you've taken GEOG 473
and 475/575).
Imagine that each oxide and element is a "dimension" in a hyperdimensional
space that we can't visualize but software and computers can. Imagine a
hyperdimensional "scatterplot" or cloud of dots, each representing the
magnitude of each record's measurement on each one of these dimensions or axes
at the intersection of its measurement on all of the other axes (that we can't
see). It's like you putting a dot at the intersection of a measurement on X at
that record's measurement on Y, except there's a load of axes in this
hyperspace.
To start a K-means cluster, you arbitrarily choose how many clusters you'd
like to find in this hyperspace of measurements. What K-means clustering does
then is arbitrarily "seed" this hyperspace with that number of candidate
cluster centers. These are just random intersections on all those axes.
Then, it measures the distance between each seed and every single dot in that
hyperspace, assigning each data point to the "nearest" seed. This is the
first proposed clustering. Then, it measures the distances among each point
in a cluster with that cluster's seed and calculates the mean center to the
resulting cluster. This mean center is likely rather distant from the seed.
So, now it repeats the whole process, but this time it uses the mean centers
as the new seeds. It keeps on doing this, using mean centers from one
iteration as the seeds for the next iteration. Each time, particular records
will "jump" from one cluster to another as the shifting locations of the
centroids makes one closer or farther away from a given record. After a
while, though, the displacements of these centroids gets smaller and smaller
and there are fewer "jumping" records. Left to its own devices, the algorithm
would just keep on going forever, but at some point the changes are so
insignificant that the algorithm is truncated. And you have a stable
classification scheme. When you do this lab, you may find your neighbors have
slightly different numbers of APXS readings in your clusters and each record
may be in differently named clusters. The differences are trivial and mainly
involve the occasional eccentric target in the "space" between the clusters,
so they may jump into different clusters from one student's work to the next.
Classifying Your APXS Data with K-Means Clustering
Double-click on the Stats folder on the Desktop. In it, navigate to and
activate PAST 3.x (not PAST 2.x). It comes up as a kind of spreadsheet
interface.
The easiest way to get your data into PAST is the following:
- In your OpenOffice Calc spreadsheet, highlight all rows from 1 through
902 (but not rows 904 and 905, which contain the means and standard
deviations, which would mess up your multivariate analysis).
- Then, hit Control-C to copy the data rows into the
Clipboard (or hit Edit -- Copy)
- Move over to PAST and get it ready to accept the Clipboard full of data.
Do that by putting a check on Row attributes and also on Column
attributes. This changes the spreadsheet display so you can copy and
paste your data in.
- Move your cursor to the white cell in the Name column at its
intersection with the Name row. In there, hit Control-V to plop the
whole thing into PAST.
- Very importantly, when the data come up (may take a while, since this is
a pretty big file), unclick the Row attributes and the Column attributes
boxes.
- When you do so, the spreadsheet changes again and, this time, you see the
names of the targets running down the grey column to the left as your record
identifier and you'll see the column headers show up in the grey boxes running
across the top as your variable identifiers.
Doing the K-Means Cluster
- Scroll over and touch the header labelled "Na2O t." Holding down
the Shift key, scroll over to the far right and click the "Br t"
header, so that all the t-score columns are highlighted (and none of the
raw measurements or other variables are: You only want the t-scores lit up).
- Now, click the Multivariate tab up top and, in the drop-down menu,
select Clustering, and then K-means.
- A dialogue box comes up asking for your selection of a number of
clusters. Enter 4.
- A pop-up box will come up showing each of your record names and the final
cluster it's been assigned to. Hit the Copy button at the bottom to
put that into the Clipboard.
Moving back to Calc for graphs and tables:
- Now, back in your original Calc spreadsheet, scroll all the way to the
far right, just past the "Br t" column, maybe around the AO column. In Row 1
of that column, hit Control-V to move the clusters from PAST into your
spreadsheet. A dialogue box will come up to import CSV data. It will
probably ask you if Separated by Tab is okay, which it is. Hit OK, and two
columns will appear: the names of your records (which you don't need and you
can safely delete that column if you like, since it's the same as Column A)
AND your clusters. That's all there is to getting analyses out of PAST into
Calc for graphing.
- Note: The clusters are given snappy names, like 1, 2, 3, and 4. The
numbers have no inherent meaning or quantitative value. They are simply names
for the clusters and PAST uses numbers instead of letters or other names. You
may find that your neighbors's spreadsheets have different numbers for the
same targets. It's okay.
Analyzing Your Clusters
So, now you have your clusters sitting in your spreadsheet, what can you do
with them? Let's try to figure out their geochemical meaning, why certain
records wound up in particular clusters. To do this, let's get some basic
descriptive statistics out of the clusters and then graph them to see what's
going on. We'll need counts of targets in each cluster and then the mean
standardized abundances for each chemical in each cluster.
First, let's get the targets counted by cluster.
-
If your cluster names copied over from PAST are sitting in Column AP, then
scroll to the bottom and, in Column AO under the last record, type 4
Clusters. Under that, type 1, 2, 3, and 4 in the cells below this header.
-
Under the column containing the actual cluster "names," right beside the
header "4 Clusters," type "Count" or "Frequency." Now you can count how
many records fell in each of your four clusters.
-
Under the "Count"
header, type =countif(AP$2:AP$902;1), paying attention to the dollar signs.
Now, drag that formula down to form four cells containing counts (they'll all
be identical at this point, all reporting how many cases of Cluster 1 you
have). Go back in and change the 1 to 2, then to 3, and then to 4.
-
The numbers will come out very uneven. One of the clusters is extremely
common: It has something like 2/3 of all the targets! Another one is a
rarity, something really unique for Mars: It has fewer than 2 percent! The
other two are roughly evenly divided in the "kind of common" area.
Second, let's calculate the mean standardized abundances for each oxide
and element by cluster.
-
Remember calculating the mean and standard deviation
for each oxide and element and getting a string of 0's for the means and 1's
for the standard deviations? The "personality" of each cluster will show up
in the form of excursions above and below 0, meaning that a cluster differs
from the rest of Mars in some particular chemical(s), possibly pretty
dramatically. Or not.
-
Some rows below that, somewhere like row 917, just to the left of your t-score
columns, type "K=4 clusters" (Cell W917?) Label cells W918 through W921, 1,
2, 3, and 4.
- To avoid having to sort the database, let's use the AVERAGEIF function.
- In the 917th row under "Na2O t" (column X, I believe), type
=averageif($AP$2:$AP$902;1;X$2:X$902). I'm assuming your imported
clusters are Column AP. Be very careful about the colons and semi-colons and
about the placement of dollar signs. Assuming "Na2O t" is in column X, you're asking the spreadsheet to take the average of any
record in that column that is in cluster 1.
- Now, move the cursor back into that cell, its lower right corner, and
right-click and copy it down to cell X921. Change the 1 to 2, 3, and 4 and you'll see a
bunch of averages on
Na2O t for clusters 1, 2, 3, and 4.
- Highlight the four cells and right-click and drag them across to column
AM or wherever you have "Br t." Instant averages across the board. Instant
personalities of clusters!
Third, let's graph the mean abundances, which will make comparison and
interpretation easier.
- Make four graphs of the mean oxides and elements, one for each cluster.
To do that, move your cursor to cell X1 (or wherever you have "Na2O t").
Highlight all your t-score headings from X1 through AM1 (or wherever "Br t"
is).
- Holding the Control key down, scroll down and highlight cells X907
through AM907. You now have two partial rows highlighted: the name of the
oxides' and elements' t-scores and their mean values for Cluster 1.
- Click the graphics wizard icon up top (the little bar chart in
OpenOffice Calc and the little pie chart in LibreOffice Calc). When the graph
comes up, select Line chart and the default option of dots but no
lines. Up will come dots representing the average t-score for each oxide
and element.
- Hit Next and Next again until you arrive at the Choose
titles, legend, and grid settings dialogue box. Unclick Display legend
(useless in this kind of situation), give your chart a suitable title
(e.g., "Cluster 1: mean standard oxides and elements" or some such.
Label the X axis "Oxides and elements" and the Y axis "mean t-scores" or
similar and then hit Finish.
- The graph still needs to be prettied up. Format the Y axis to show
one decimal place of accuracy and its scale should run from -2 or so to +6 or
so. You need to get the X axis out of the middle of the chart
wall: The X axis Positioning should Cross other axis at Start and its Label's
Text orientation should be 90° to be able to fit them all in
æsthetically. You might want to Format the Chart area to show a neat
line border and you can experiment with colors.
- Repeat the process for clusters 2, 3, and 4. Be especially careful to
set the Y axes in the same range (-2 to +6 should work) so you can
directly compare one cluster's chart to the others in terms of relative
departures from Mars norms.
Discussion and Lab report
So, what do we see here?
-
What is the most common rock on Mars, which we've met over and over this
semester? Think ST1 (Surface Type 1) when we were discussing possible
explanations for the crustal dichotomy. What was the rock type Spirit was
deposited on, not what the team was hoping for? If one of the graphs shows
almost no departures from Mars norms, all the dots hugging the 0 line, it
probably is this kind of rock and associated material. Which one of your
clusters looks like that? What clinches the identification as the most common
rock on Mars (look at cluster counts).
-
Now, let's look at the odder clusters. One of them shows extreme enrichments
in certain elements and oxides and substantial depletions in a number of
oxides. Which cluster is the crazy one? Consulting the key below, what might
be causing these extreme enrichments? How common is this cluster?
-
Another, more common cluster shows extreme enrichment in one oxide and
noticeable enrichment in another, along with noticeable depletion in
felsic-associated oxides. Which cluster is this and which process from the
key below might account for its quirks?
-
Another cluster shows a very different pattern of excursions above and below
Mars norms. Looking at the key below, what unites the six elevated oxides?
What unites the two most depleted oxides? What kind of material is this?
Now that you have a sense of what these clusters are all about, go back to the
introduction and see if you can match each of your clusters with the
metaclusters that emerged from my perhaps too granular earlier study (the one
with the fifteen clusters).
Deliverables:
Write up your discussion in a 1-2 page lab report describing your four
clusters and how they relate to my own metaclusters. Please attach your four
graphs (maybe try to get two per page). Also deposit your spreadsheet in the
dropbox.
Next Lab
Our next lab will have you map these clusters in Google Earth Mars. You'll
turn your spreadsheet into a CSV formatted a very specific way, upload it into
Google Earth Pro, and have it create a KMZ file out of that and a map suitable
for wowing friends and relatives!
Key to Processes that Enrich or Deplete Certain Oxides and Elements
Magma evolution
- Magma develops from mafic and ultramafic minerals that melt for a variety
of reasons and then rise in the crust. It might erupt pretty quickly without
much evolution in the magma chamber, producing basaltic lavas, with a lot of
iron and/or magnesium content (as in olivines and pyroxenes) and calcium-rich
plagioclases.
- Magma may sit in a magma chamber for a long enough time that
fractionation takes place. This means that there's a slow process of cooling,
so that there's an orderly solidification and crystallization of minerals at
different temperatures. So, calcium-rich plagioclase solidifies at the
hottest temperatures and becomes more and more sodium-rich as temperatures
cool. In addition to the different "freezing" points of different minerals,
there may be chemical reactions among crystallized minerals and the still
molten magma surrounding them, triggering the formation of different minerals.
Olivine crystallization triggers the abrupt formation of pyroxene and then
this reacts with the magma to trigger crystallization of amphiboles, which then
trigger a reaction that crystallizes biotite. The purely temperative-driven
and the partly reaction-driven sequence of mineral formation then is replaced
by a purely temperature-driven sequence of potassium orthoclase, muscovite,
and eventually quartz at the "coolest" temperatures.
- These temperature and chemistry driven changes in a cooling magma body
are called fractionation and fractionation produces evolution in magma.
- Magma evolves rocks ranging from dominance by mafic and ultramafic
minerals, such as basalt or gabbro, into intermediate minerals, such as
diorite and andesite, into felsic minerals, such as granite and rhyolite, or more alkalic versions, such as trachyte.
- Fractionation is marked by shifts in dominance from iron, magnesium,
calcium, chromium, and magnesium oxides over to increasing dominance by silica
and the oxides of sodium, aluminum, potassium, phosphorous, and titanium.
- On Mars, magma rarely evolves all the way to granitic or trachytic character.
Fractionation tends to move in that direction, but not too far
along that trajectory, producing andesitic or trachyandesitic rocks and basalts.
Alteration
- Alteration refers to changes in a rock's or mineral's composition and
structure after emplacement due to chemical and mechanical processes. These
can include reduction/oxidation reactions, hydration/dehydration reactions,
and acid/base reactions. Of particular note on Mars is change in aqueous
geochemistry through time.
-
Mars had more water in Noachian times and that
early water was apparently neutral to alkaline at first. Neutral water
readily dissolves chlorine and bromine and allows them to migrate. The
dessication of Mars and the evaporation of water leads to concentrations in
these salt-forming halogens. Other elements and oxides can also dissolve in
fairly neutral water, migrate, and concentrate during evaporation, such as silica,
magnesium oxide, potassium oxide, manganese oxide, and zinc. Neutral water is
not only associated with concentrations of halogens but with the formation of
phyllosilicate clays.
-
At a later time, Late Noachian or Hesperian, aqueous chemistry becomes highly
acidic, probably due to massive volcanic activity and the huge amounts of
sulfur species that are shot into the atmosphere during major eruptions.
Sulfur species were so abundant as to acidify surface and groundwater and
block formation of carbonates and encourage formation of sulfates, including
calcium sulfate. The one sulfur molecule that APXS can pick up is sulfur
trioxide.
PAST
Hammer, Øyvind; Harper, David A.T.; and Ryan, Paul D. 2001. PAST:
Paleontological statistics software package for education and data analysis.
Palaeontologia Electronica 4, 1: 9 pp. Available at http://palaeo-
electronica.org/2001_1/past/issue1_01.htm
|