[ VIEWGRAPHS ]

K-Means Clustering and Mapping
of all Mars Rovers' APXS Data, 1997-2016

OpenPlanetary
Virtual Lunches, YouTube
12 May 2020, 2 p.m. ET

Christine M. Rodrigue
Professor of Geography
California State University, Long Beach

Background

[ SLIDE 1 ]

First, a bit of background. Much of my research has dealt with hazards. Some of these projects include wildfire in Mediterranean scrub, the Northridge earthquake, and a group of projects I call “disaster by management” -- 9/11, Katrina, the Columbia crash.

Another emphasis is biogeography, especially wildfire ecology in Southern California and documentation and conservation of California sage scrub.

My teaching load includes courses in hazards and risk management; biogeography and California ecosystems; and multivariate statistics.

So, how did that background get me to Mars? In the late nineties, I worked on a project on how the then-new Internet was being used by 2 activists to create a large social movement to stop the Cassini-Huygens mission. NASA got wind of this and invited me to present my findings to 5 of its centers in 2001. They were interested in how better to communicate risk for the Mars Sample Return Mission being planned for a 2008 launch. The telecon organizers asked me if I would follow that controversy. I agreed and decided I needed to learn about Mars as context.

Meanwhile, MSR kept being delayed and was then canceled. So, by 2005, I was left with no project and a lot of hard-earned background on Mars. Rather than forget it all, I created the first Geography of Mars class in 2007 – at least our students would get something out of all that. A lot of my Mars work since then has been driven by course development.

So what about APXS? I thought I'd “just” put together all the APXS data from all 4 rovers and have students do a K-means clustering and Google Earth mapping lab with them. I got the APXS data from the PDS Geosciences Node as a series of CSV files that had been uploaded there periodically by the different teams involved, with very different layouts and formats. This took a lot of pre-processing to integrate. Pinning APXS “zaps” to lat./lon. was challenging, with different datums and cartographic conventions, and who knows which datum Google Earth uses.

[ SLIDE 2 ]

My goal was to have students do a simple classification that they could then map in Google Earth Pro: ~4 or 5 categories. I picked K-means clustering as it is non-hierarchical and flexible. It optimizes the allocation of data to a specified number of classes. It first “seeds” the data space with proposed centroids, usually randomly. Then, it measures distance between each record and all of the seeds, assigning it to the nearest seed. Once all records are assigned, the centroid for each group is calculated. and then used to seed the next iteration, producing changes in some records' cluster assignments. The process is truncated once the assignments become stable.

[ SLIDE 3 ]

Once records are classified, they can be mapped fairly easily in Google Earth Pro. Google Earth needs only a name for the record and its latitude and longitude and the records can be put into folders corresponding to the classes. It can import CSV files, which can then be saved as a native KMZ file that can be opened by anyone with Google Earth or Google Earth Pro. All records in a folder can then be assigned a common icon.

But for me to understand the variability in the data set to help the students interpret their results, I performed K-means classifications in PAST for K=10, K=20, sometimes first reduced in PCA, and settled on K=15 with standardized data, which is what I'll talk about today.

Data and Methods

[ SLIDE 4 ]

There were eight steps to the workflow. I downloaded PDS Geosciences Node CSV files on all 901 APXS readings from July 1997 to November 2016. These were in several formats, so I had to get them integrated into one common database, which I did in OpenOffice Calc.

I then geocoded each record ... this was a real nightmare. Fred Calef at JPL graciously fielded my barrage of questions and shared waypoints files with me. Once they were geocoded, I standardized the oxides and elements abundance data for each record against means and standard deviations for all 901 records in Calc. These t-scores were moved into PAST and K-means clustering was done for the K=15 request. I then calculated descriptive statistics for each standardized oxide and element in Calc for each of the 15 clusters to characterize the patterns of enhancements and depletions for each cluster.

After I became comfortable enough with this to be able to shepherd students through it and walk them through the process of mapping their work in Google Earth Pro, I became interested in using the common cluster system to compare and contrast each of the four rover sites against one another in this common classification scheme.

Another point of interest was visual comparison of the 15 clusters’ diagrams and noticing that the cluster fell into five broader groupings that I call “metaclusters.” The clusters in each fall along trends of exaggeration. These imply a series of origins and aqueous alteration pathways.

Results of K-Means Clustering and Development of the Metaclusters

[ SLIDE 5 ]

Cluster 7 forms a one cluster metacluster that I called X for potentially eXogenous. Some of the targets that were grouped into Cluster 7 were iron meteorites. Others were not but share the strong elevation in nickel. Nickel enhancement can be a result of hydrothermal alteration of olivine-rich basalts and that is consistent also with the elevation in zinc and magnesium.

[ SLIDE 6 ]

Clusters 6, 8, 15, and 5 form Metacluster B for basalt. These graphs hew closely to the martian averages, reflective of a basaltic planet. Cluster 6 shows virtually no departure from Mars norms, while Cluster 8 represents picritic basalt, especially prevalent on the floor of Gusev Crater. Cluster 15 seems to represent the admixture of local basalts with the homogenizing iron oxide rich global dust. Cluster 5 is often interpreted as unaltered olivine with elevated iron and magnesium oxides, but there are cases in which proximal materials in this class seem to form a source and sink relationship where groundwater or hydrothermal fluids precipitate iron and magnesium carbonates.

[ SLIDE 7 ]

Clusters 3, 2, and 10 form Metacluster E for evolved magmatic materials. Cluster 2 was first spotted at the Pathfinder/Sojourner site and deemed andesitic because of its elevated silica as well as depressed iron, magnesium, and calcium oxides. Alternatively, this profile could result from neutral or alkaline water interactions with primitive basalt that can liberate, mobilize, and concentrate silica. Cluster 3 seems more clearly evolved, with strongly elevated alkalis and aluminum oxides and slight elevation in silica. Cluster 10 resembles Cluster 3 but with additional elevations in oxides of phosphorous and titanium. This cluster, confined to Husband Hill, has been interpreted as pyroclastic tephrites.

[ SLIDE 8 ]

Clusters 1, 12, 9, and 11 are grouped in Metacluster N for showing signs of alteration of basaltic materials in neutral or neutral-alkaline water, shown in enrichments in chlorine and bromine in various ratios. Cluster 1 shows little departure from Mars norms, but there are slight positive excursions in chlorine and bromine. Cluster 12 shows stronger elevation of both halogens and magnesium oxide. Cluster 9 shows weak elevation in chlorine but extremely high elevation in bromine, as well as zinc and potassium oxide. This may reflect changes in the composition of evaporating water, since chlorine tends to precipitate early and bromine later. Cluster 11 has spectacularly elevated levels of chlorine and bromine, as well as high levels of the fluid-mobile zinc, manganese oxide, and magnesium oxide.

[ SLIDE 9 ]

Clusters 4, 13, and 14 come together in Metacluster S for showing progressive enhancement in sulfur trioxide. Cluster 4 is pretty close to Mars-typical, perhaps a little depleted in alkali oxides, silica, and aluminum oxide, but showing a marked elevation in the sulfur signal. Cluster 13 exaggerates these tendencies, with sulfur trioxide spectacularly elevated (and manganese oxide is significantly overrepresented). Cluster 14, like 13, exaggerates the elevations and depletions of Cluster 4 but, unlike Cluster 13, it shows truly spectacular elevation of calcium oxide. This has been interpreted as calcium sulfate (note typo in abstract 1262 -- somehow I put carbonate there instead of sulfate!).

[ SLIDE 10 ]

I’ve plotted the means of all targets in each metacluster, and their “personalities” come through the averaging.

Implications for Materials Origins and Alteration Pathways

[ SLIDE 11 ]

The metaclusters can themselves be sorted into three different target origins, two of volcanic character and one possibly exogenic. The overwhelmingly most common is basaltic, but there may also be more fractionated materials. There are also the occasional meteorites, though many of these may be missed for similarity to native martian materials.

Given its commonness, basaltic materials are the ones that reveal signals of water alteration. The more unambiguously pristine basalts are seen here on the left side of this graph, and on the right they bifurcate into two distinctive alteration pathways. The upper one entails signs of neutral or neutral-alkaline waters, most likely groundwater but sometimes surface water, too. These are marked by different levels of halogens’ presence and by their relative balances. The lower pathway suggests alteration in acidic conditions with elevation of sulfur trioxide, the final cluster, 14, showing the concentration of calcium oxide in the form of calcium sulfate.

The evolved materials’ starting point is more ambiguous. Cluster 3 seems clearly to represent fractionated materials along the alkaline series sometimes compared to terrestrial mugearite. Cluster 2 was thought to be andesite when it was first encountered in Chryse Planitia but it may also entail neutral aqueous alteration of basalt. Cluster 10, confined to Gusev Crater, resembles an exaggeration of Cluster 3 and is commonly attributed to tephrite, but the fractionation argument is not universally embraced. It has been argued to show the formation of montmorillonite clays in alkaline waters.

Exogenous materials are harder to spot, other than the iron meteorites. So, any stony-irons or stony meteorites may never be detected, especially if they did undergo aqueous alteration.

Geographical Allocation of Clusters

[ SLIDE 12 ]

I believe this is the first attempt to integrate all readings (as of November 2016) from one instrument, APXS, common to all four rovers in a common inductively developed classification system. This common classification can now be used to compare the four sites in the same terms. Of the 15 available clusters, 2 are found at the MPF site in Chryse Planitia, 12 by MER-A in Gusev Crater, 13 by MER-B in Meridiani Planum, and 11 by MSL in Gale Crater.

[ SLIDE 13 ]

The allocation of metaclusters among the 3 rovers with large numbers of APXS readings is highly significantly different in a Chi-square test (p<0.001) and of moderately strong effect size (V=0.413).

Gusev Crater is basalt dominated and somewhat acidic aqueous impoverished; Meridiani Planum is dominated by acidic aqueous signals and basalt, but short on evolved magma-derived and neutral aqueous altered materials; and Gale Crater is enhanced in evolved magma-derived and neutral-aqueous altered materials and deficient in unaltered basaltic materials.

[ SLIDE 14 ]

To view the APXS clusters by site in Google Earth or Google Earth Pro, you can download the KMZ from https://home.csulb.edu/~rodrigue/mars/apxs/GE/APXS15tscores.kmz. Once saved someplace you can find it, open Google Earth, select Mars from the Planets tab on the toolbar, and then File - Open and navigate to wherever you stored it. It will come up in Temporary Places and you need to check the file to allow it to display. In the Search box, type in Pathfinder, Spirit, Opportunity, or Curiosity to be “flown” there. At the Curiosity site, you will see the erroneous Google traverse map. To see the actual location of MSL, you need to download Fernando Nogal’s KMZ, “The Martian Way,” from http://www.unmannedspaceflight.com/index.php?act=attach&type=post&id=40403.

[ SLIDE 15 ]

So, how did the student lab turn out? I had them request K=4. Because K-means clustering gives slightly different results on the same data, these 16 “experiments” collectively yielded 5 categories, and these were recognizably the same as my 5 metaclusters. Detail, however, was lost. The algorithm incorporated more targets in the basalt category, losing those cases of subtle alteration at the beginning of the alteration pathways. Neutral aqueous altered materials pretty much disappeared at the two MER sites. Recommendation: Use a higher K request than the number of classes you want and then visually group them into a smaller number to be able to capture these more subtle modifications.

Links

This document is maintained by C.M. Rodrigue
First placed on web 05/12/20
Last Updated: 05/12/20