[ image of Mars ]       

The Geography of Mars

Lab

Comparing APXS Geochemistry at All Four Rover Study Areas

Christine M. Rodrigue, Ph.D.

Department of Geography
California State University
Long Beach, CA 90840-1101
1 (562) 985-4895
rodrigue@csulb.edu
https://home.csulb.edu/~rodrigue/

APXS Data from All Four Rovers (as of November 2016)

This lab has the following objectives:
  • To share with you a consolidated database I put together of all APXS data from all four rovers as of November 2016, which I then geocoded
  • To have you process the data into intelligible form using K-means clustering in the PAST multivariate statistical program (free!) and Open/LibreOffice Calc (open-source and free!).
  • Moi, I took the "scenic route" through them statistically, probably using much too elaborate a process (e.g., 15 clusters!). The data settled down into about four to five "metaclusters," which seem to emphasize the following processes:
    • meteors
    • Mars-typical unevolved basalts and basalt-derived soils
    • materials that may represent fractionated magmas (more felsic than Mars- typical basalts)
    • rocks and soils that may have been altered by neutral or neutral-alkaline water (elevated halogens)
    • rocks and soils that may have been altered by acidic, sulfur-rich water
  • I would like you to do a simpler, four cluster version of this analysis to see if the four "metaclusters" show up as four intelligible K-means clusters
Background

The data came from the Planetary Data System (PDS) Geosciences Node in the form of several files in a variety of formats. I put them into a common format in OpenOffice Calc and geocoded them manually. The file includes targets' informal names, formal names, the mission they were from, the mission-relative sol a target was taken, latitude, longitude, elevation, target type (rock, soil, disturbance), followed by the raw APXS data as weight percentages or parts per million for each of sixteen common oxides and elements. They are: sodium oxide, magnesium oxide, aluminum oxide, silica, phosphorous pentoxide, sulfur trioxide, chlorine, potassium oxide, calcium oxide, titanium oxide, chromium (III) oxide, manganese oxide, ferrous oxide, nickel, zinc, and bromium.

The Alpha Particle X-ray Spectrometer (or, on Pathfinder/Sojourner, the Alpha Proton X-ray Spectrometer) is an instrument that an arm on the rover places into contact with a target. It then exposes the target to alpha particles (basically, just the nuclei of helium atoms, that is, a helium ion) and X-rays from a radioactive source in the instrument (curium-244). Some of these are reflected from atomic nuclei, backscattered pretty much unaltered into the spectrometer. Others impact electrons from atoms' inner shells, evicting them as beta particles and ionizing the atoms. Electrons in higher shells drop down to replace them but they have to pay for the real-estate upgrade by emitting characteristic X-ray photons that scatter into the APXS. Still other electrons will absorb the energy from the APXS and then re-emit it as X-rays with distinctively longer wavelenths than the outgoing X-rays. All this reflection, displacement, absorption, and re-emission activity generates a spectrum of X-rays back into the APXS and the peaks and lows along that spectrum can be matched to the known spectral distributions of key mineral-forming oxides and elements.

Here is an example of such a spectrum for the Jake Matijevic rock, the first APXS target of the Curiosity rover, with the diagnostic peaks for the various oxides and elements thoughtfully labelled!

[ APXS spectrum for MSL APXS target Jake Matijevic, NASA ]

Acquiring and pre-processing your data

First, fire up OpenOffice from the Desktop. Now, download the consolidated database from https://home.csulb.edu/~rodrigue/geog441541/labs/K-means/4rovers441541.ods. Save it (or note where your browser plopped it) and then navigate to it with OpenOffice Calc.

There are the raw data described above. Before doing any analysis, you need to get it into standardized format, so that no oxide or element is disproportionately weighted simply because of the units in which it's measured by APXS (ppm vs. wt%). The idea here is to focus on each target as it departs from the all-Mars average conditions for each oxide and element. So, at the bottom of the spreadsheet, let's say in row 904, calculate the mean for each chemical -- =average(H2:H902), H being whichever column letter you're processing. Right below that, calculate the standard deviation for each oxide and element -- =stdev(H2:H902). Now, highlight the two cells containing the mean and the standard deviation for sodium oxide. Move the cursor to the lower right of the highlighted area (it'll turn from an arrow to a crosshair). Right-click and drag the formulas all the way to the rightmost column containing data (bromium, in Column W).

Set up as many column headers to the right of your raw data and label them "Na2O t" and so on (in a fit of generosity, I just put in the headers to save you the tedium). These new columns will house the t-scores, tha is, the standardized scores for each of the original raw scores, and it is these t-scores that will go into the clustering phase of the analysis. Each target's weight in the analysis will now represent how typical or how squirrely it is in comparison with that chemical's norm across all Mars targets.

First, note the column letter for the column housing the original "Na2O wt%" data: "H." Now, in the second row of the new column on the right headed "Na2O t" (that would be "X"), type =(H2-H$904)/H$905. That is, the reading for Na2O for your first target minus the martian mean for that chemical divided by the standard deviation for that chemical. That's the t-score. It is important to put a dollar sign before the 904 and 905 in this formula, in order to lock the spreadsheet's calculations on the rows containing the means and standard deviations when you copy the formula down.

Now, right-click and drag that cell all the way down the column through the 902nd row -- voilà, the whole column turned into t-scores. Now, highlight all the cells from X2 through X902 and then move the cursor to the bottom right corner of the last entry, right-click and drag all the way to the last column on the right (Br t in Column AM). All your chemical readings standardized. Now, no one oxide or element has disproportionate weight due either to the units it was measured in or due to differences in variability among the chemicals.

Hit Format and then Cells and then format the columns of numbers to a consistent number of decimal places, such as 4 or 5, or whatever.

To check that you got all this right, go back to the last mean and standard deviation (Br ppm). Hightlight those two cells, right-click and drag across to the Br t column. The means should all be 0 and the standard deviations should all be 1 (they've been standardized).

Save the file!

Analytical Technique

We'll use PAST (PAlaeontological STatistics) to crunch the data. PAST is a freely available multivariate statistics package originally developed for palaeonological analysis that has become pretty popular in ever more disciplines, such as archaeology, biogeography, and ecology. It offers many of the multivariate techniques found in commercial statistics software, such as SPSS, Minitab, SAS, and Statistica, but it also includes some techniques not found in those (e.g., detrended correspondence analysis and biodiversity measures), as well as a more dependable version of k-means clustering, the technique we'll apply here. PAST was developed by Øyvind Hammer at the Natural History Museum at the University of Oslo David A.T. Harper at the Geological Museum in Copenhagen, and Paul D. Ryan at the National Museum of Ireland and made available for others to use for free. It's not open-source, but it is freely available, so be sure to cite the authors if you make use of it.

The technique we'll use is K-means clustering. It is a multivariate technique used in pattern recognition and data-mining, more than in formal hypothesis testing. It is a common approach to unsupervised image classification in remote sensing (it may ring bells if you've taken GEOG 473 and 475/575).

Imagine that each oxide and element is a "dimension" in a hyperdimensional space that we can't visualize but software and computers can. Imagine a hyperdimensional "scatterplot" or cloud of dots, each representing the magnitude of each record's measurement on each one of these dimensions or axes at the intersection of its measurement on all of the other axes (that we can't see). It's like you putting a dot at the intersection of a measurement on X at that record's measurement on Y, except there's a load of axes in this hyperspace.

To start a K-means cluster, you arbitrarily choose how many clusters you'd like to find in this hyperspace of measurements. What K-means clustering does then is arbitrarily "seed" this hyperspace with that number of candidate cluster centers. These are just random intersections on all those axes.

Then, it measures the distance between each seed and every single dot in that hyperspace, assigning each data point to the "nearest" seed. This is the first proposed clustering. Then, it measures the distances among each point in a cluster with that cluster's seed and calculates the mean center to the resulting cluster. This mean center is likely rather distant from the seed.

So, now it repeats the whole process, but this time it uses the mean centers as the new seeds. It keeps on doing this, using mean centers from one iteration as the seeds for the next iteration. Each time, particular records will "jump" from one cluster to another as the shifting locations of the centroids makes one closer or farther away from a given record. After a while, though, the displacements of these centroids gets smaller and smaller and there are fewer "jumping" records. Left to its own devices, the algorithm would just keep on going forever, but at some point the changes are so insignificant that the algorithm is truncated. And you have a stable classification scheme. When you do this lab, you may find your neighbors have slightly different numbers of APXS readings in your clusters and each record may be in differently named clusters. The differences are trivial and mainly involve the occasional eccentric target in the "space" between the clusters, so they may jump into different clusters from one student's work to the next.

Classifying Your APXS Data with K-Means Clustering

Double-click on the Stats folder on the Desktop. In it, navigate to and activate PAST 3.x (not PAST 2.x). It comes up as a kind of spreadsheet interface.

The easiest way to get your data into PAST is the following:

  • In your OpenOffice Calc spreadsheet, highlight all rows from 1 through 902 (but not rows 904 and 905, which contain the means and standard deviations, which would mess up your multivariate analysis).
  • Then, hit Control-C to copy the data rows into the Clipboard (or hit Edit -- Copy)
  • Move over to PAST and get it ready to accept the Clipboard full of data. Do that by putting a check on Row attributes and also on Column attributes. This changes the spreadsheet display so you can copy and paste your data in.
  • Move your cursor to the white cell in the Name column at its intersection with the Name row. In there, hit Control-V to plop the whole thing into PAST.
  • Very importantly, when the data come up (may take a while, since this is a pretty big file), unclick the Row attributes and the Column attributes boxes.
  • When you do so, the spreadsheet changes again and, this time, you see the names of the targets running down the grey column to the left as your record identifier and you'll see the column headers show up in the grey boxes running across the top as your variable identifiers.
Doing the K-Means Cluster
  • Scroll over and touch the header labelled "Na2O t." Holding down the Shift key, scroll over to the far right and click the "Br t" header, so that all the t-score columns are highlighted (and none of the raw measurements or other variables are: You only want the t-scores lit up).
  • Now, click the Multivariate tab up top and, in the drop-down menu, select Clustering, and then K-means.
  • A dialogue box comes up asking for your selection of a number of clusters. Enter 4.
  • A pop-up box will come up showing each of your record names and the final cluster it's been assigned to. Hit the Copy button at the bottom to put that into the Clipboard.
Moving back to Calc for graphs and tables:
  • Now, back in your original Calc spreadsheet, scroll all the way to the far right, just past the "Br t" column, maybe around the AO column. In Row 1 of that column, hit Control-V to move the clusters from PAST into your spreadsheet. A dialogue box will come up to import CSV data. It will probably ask you if Separated by Tab is okay, which it is. Hit OK, and two columns will appear: the names of your records (which you don't need and you can safely delete that column if you like, since it's the same as Column A) AND your clusters. That's all there is to getting analyses out of PAST into Calc for graphing.
  • Note: The clusters are given snappy names, like 1, 2, 3, and 4. The numbers have no inherent meaning or quantitative value. They are simply names for the clusters and PAST uses numbers instead of letters or other names. You may find that your neighbors's spreadsheets have different numbers for the same targets. It's okay.

Analyzing Your Clusters

So, now you have your clusters sitting in your spreadsheet, what can you do with them? Let's try to figure out their geochemical meaning, why certain records wound up in particular clusters. To do this, let's get some basic descriptive statistics out of the clusters and then graph them to see what's going on. We'll need counts of targets in each cluster and then the mean standardized abundances for each chemical in each cluster.

First, let's get the targets counted by cluster.

  • If your cluster names copied over from PAST are sitting in Column AP, then scroll to the bottom and, in Column AO under the last record, type 4 Clusters. Under that, type 1, 2, 3, and 4 in the cells below this header.
  • Under the column containing the actual cluster "names," right beside the header "4 Clusters," type "Count" or "Frequency." Now you can count how many records fell in each of your four clusters.
  • Under the "Count" header, type =countif(AP$2:AP$902;1), paying attention to the dollar signs. Now, drag that formula down to form four cells containing counts (they'll all be identical at this point, all reporting how many cases of Cluster 1 you have). Go back in and change the 1 to 2, then to 3, and then to 4.
  • The numbers will come out very uneven. One of the clusters is extremely common: It has something like 2/3 of all the targets! Another one is a rarity, something really unique for Mars: It has fewer than 2 percent! The other two are roughly evenly divided in the "kind of common" area.

Second, let's calculate the mean standardized abundances for each oxide and element by cluster.

  • Remember calculating the mean and standard deviation for each oxide and element and getting a string of 0's for the means and 1's for the standard deviations? The "personality" of each cluster will show up in the form of excursions above and below 0, meaning that a cluster differs from the rest of Mars in some particular chemical(s), possibly pretty dramatically. Or not.
  • Some rows below that, somewhere like row 917, just to the left of your t-score columns, type "K=4 clusters" (Cell W917?) Label cells W918 through W921, 1, 2, 3, and 4.
  • To avoid having to sort the database, let's use the AVERAGEIF function.
  • In the 917th row under "Na2O t" (column X, I believe), type =averageif($AP$2:$AP$902;1;X$2:X$902). I'm assuming your imported clusters are Column AP. Be very careful about the colons and semi-colons and about the placement of dollar signs. Assuming "Na2O t" is in column X, you're asking the spreadsheet to take the average of any record in that column that is in cluster 1.
  • Now, move the cursor back into that cell, its lower right corner, and right-click and copy it down to cell X921. Change the 1 to 2, 3, and 4 and you'll see a bunch of averages on Na2O t for clusters 1, 2, 3, and 4.
  • Highlight the four cells and right-click and drag them across to column AM or wherever you have "Br t." Instant averages across the board. Instant personalities of clusters!

Third, let's graph the mean abundances, which will make comparison and interpretation easier.

  • Make four graphs of the mean oxides and elements, one for each cluster. To do that, move your cursor to cell X1 (or wherever you have "Na2O t"). Highlight all your t-score headings from X1 through AM1 (or wherever "Br t" is).
  • Holding the Control key down, scroll down and highlight cells X907 through AM907. You now have two partial rows highlighted: the name of the oxides' and elements' t-scores and their mean values for Cluster 1.
  • Click the graphics wizard icon up top (the little bar chart in OpenOffice Calc and the little pie chart in LibreOffice Calc). When the graph comes up, select Line chart and the default option of dots but no lines. Up will come dots representing the average t-score for each oxide and element.
  • Hit Next and Next again until you arrive at the Choose titles, legend, and grid settings dialogue box. Unclick Display legend (useless in this kind of situation), give your chart a suitable title (e.g., "Cluster 1: mean standard oxides and elements" or some such. Label the X axis "Oxides and elements" and the Y axis "mean t-scores" or similar and then hit Finish.
  • The graph still needs to be prettied up. Format the Y axis to show one decimal place of accuracy and its scale should run from -2 or so to +6 or so. You need to get the X axis out of the middle of the chart wall: The X axis Positioning should Cross other axis at Start and its Label's Text orientation should be 90° to be able to fit them all in æsthetically. You might want to Format the Chart area to show a neat line border and you can experiment with colors.
  • Repeat the process for clusters 2, 3, and 4. Be especially careful to set the Y axes in the same range (-2 to +6 should work) so you can directly compare one cluster's chart to the others in terms of relative departures from Mars norms.

Discussion and Lab report

So, what do we see here?

  • What is the most common rock on Mars, which we've met over and over this semester? Think ST1 (Surface Type 1) when we were discussing possible explanations for the crustal dichotomy. What was the rock type Spirit was deposited on, not what the team was hoping for? If one of the graphs shows almost no departures from Mars norms, all the dots hugging the 0 line, it probably is this kind of rock and associated material. Which one of your clusters looks like that? What clinches the identification as the most common rock on Mars (look at cluster counts).

  • Now, let's look at the odder clusters. One of them shows extreme enrichments in certain elements and oxides and substantial depletions in a number of oxides. Which cluster is the crazy one? Consulting the key below, what might be causing these extreme enrichments? How common is this cluster?

  • Another, more common cluster shows extreme enrichment in one oxide and noticeable enrichment in another, along with noticeable depletion in felsic-associated oxides. Which cluster is this and which process from the key below might account for its quirks?

  • Another cluster shows a very different pattern of excursions above and below Mars norms. Looking at the key below, what unites the six elevated oxides? What unites the two most depleted oxides? What kind of material is this?

Now that you have a sense of what these clusters are all about, go back to the introduction and see if you can match each of your clusters with the metaclusters that emerged from my perhaps too granular earlier study (the one with the fifteen clusters).

Deliverables: Write up your discussion in a 1-2 page lab report describing your four clusters and how they relate to my own metaclusters. Please attach your four graphs (maybe try to get two per page). Also deposit your spreadsheet in the dropbox.

Next Lab Our next lab will have you map these clusters in Google Earth Mars. You'll turn your spreadsheet into a CSV formatted a very specific way, upload it into Google Earth Pro, and have it create a KMZ file out of that and a map suitable for wowing friends and relatives!

Key to Processes that Enrich or Deplete Certain Oxides and Elements

Magma evolution

  • Magma develops from mafic and ultramafic minerals that melt for a variety of reasons and then rise in the crust. It might erupt pretty quickly without much evolution in the magma chamber, producing basaltic lavas, with a lot of iron and/or magnesium content (as in olivines and pyroxenes) and calcium-rich plagioclases.
  • Magma may sit in a magma chamber for a long enough time that fractionation takes place. This means that there's a slow process of cooling, so that there's an orderly solidification and crystallization of minerals at different temperatures. So, calcium-rich plagioclase solidifies at the hottest temperatures and becomes more and more sodium-rich as temperatures cool. In addition to the different "freezing" points of different minerals, there may be chemical reactions among crystallized minerals and the still molten magma surrounding them, triggering the formation of different minerals. Olivine crystallization triggers the abrupt formation of pyroxene and then this reacts with the magma to trigger crystallization of amphiboles, which then trigger a reaction that crystallizes biotite. The purely temperative-driven and the partly reaction-driven sequence of mineral formation then is replaced by a purely temperature-driven sequence of potassium orthoclase, muscovite, and eventually quartz at the "coolest" temperatures.
  • These temperature and chemistry driven changes in a cooling magma body are called fractionation and fractionation produces evolution in magma.
  • Magma evolves rocks ranging from dominance by mafic and ultramafic minerals, such as basalt or gabbro, into intermediate minerals, such as diorite and andesite, into felsic minerals, such as granite and rhyolite, or more alkalic versions, such as trachyte.
  • Fractionation is marked by shifts in dominance from iron, magnesium, calcium, chromium, and magnesium oxides over to increasing dominance by silica and the oxides of sodium, aluminum, potassium, phosphorous, and titanium.
  • On Mars, magma rarely evolves all the way to granitic or trachytic character. Fractionation tends to move in that direction, but not too far along that trajectory, producing andesitic or trachyandesitic rocks and basalts.

Alteration

  • Alteration refers to changes in a rock's or mineral's composition and structure after emplacement due to chemical and mechanical processes. These can include reduction/oxidation reactions, hydration/dehydration reactions, and acid/base reactions. Of particular note on Mars is change in aqueous geochemistry through time.
  • Mars had more water in Noachian times and that early water was apparently neutral to alkaline at first. Neutral water readily dissolves chlorine and bromine and allows them to migrate. The dessication of Mars and the evaporation of water leads to concentrations in these salt-forming halogens. Other elements and oxides can also dissolve in fairly neutral water, migrate, and concentrate during evaporation, such as silica, magnesium oxide, potassium oxide, manganese oxide, and zinc. Neutral water is not only associated with concentrations of halogens but with the formation of phyllosilicate clays.
  • At a later time, Late Noachian or Hesperian, aqueous chemistry becomes highly acidic, probably due to massive volcanic activity and the huge amounts of sulfur species that are shot into the atmosphere during major eruptions. Sulfur species were so abundant as to acidify surface and groundwater and block formation of carbonates and encourage formation of sulfates, including calcium sulfate. The one sulfur molecule that APXS can pick up is sulfur trioxide.

PAST

Hammer, Øyvind; Harper, David A.T.; and Ryan, Paul D. 2001. PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica 4, 1: 9 pp. Available at http://palaeo- electronica.org/2001_1/past/issue1_01.htm

Mars Home | Dr. Rodrigue's Home | Geography Home | ES&P Home | EMER Home | NASA Mars

Scientific Calculator | CSULB Home | My CSULB | BeachBoard | Campus Search | Library

This document is maintained by Dr. Rodrigue
First placed on the web: 03/01/18
Last updated: 11/16/19