K-Means Clustering http://people.revoledu.com/kardi/tutorial/kMean/index.html explains k-means clustering and gives a numerical example. Data is grouped into clusters by a repetitive process. The points4.txt file contains the four data point in the above article. points10000a.txt contains 10,000 points and points100000a.txt contains 100,000. This is a good example to illustrate concurrency. For larger files you could divide the data among multiple agents. Each cluster should have its own agent. Mutable refs may be used to hold the points associated with each cluster so there will be one such ref for each cluster. The book on page 111 shows how to read a file. These data files have one point on each line. The first two values are the x- and y- coordinates. Ignore any other values. All points are two-dimensional although this procedure works for any number of dimensions. A Java StringTokenizer will divide a string into tokens using the nextToken method. Choose separators as " ," to include both the space and the comma. Assuming k clusters assign the first k points as the cluster centers. The algorithm iterates until there are no more changes of points from one cluster to the next. At each stage we first assign each point to a cluster then calculate the new cluster centers and check for any changes. Assign a point to a cluster whose center is at the minimum distance from the point (the closest cluster). Add it to the ref for that cluster, and update the point with its new cluster assignment. When all the points have been assigned to clusters have each cluster compute its new center as the average of the points assigned to it. Clear the member references associated with the cluster to be ready for the next round of assignments. Associate the new center with the cluster agent. When the algorithm completes output the cluster centers and the number of iterations performed. Compare the timing for the 10,000 and 100,000 files. Use five cluster agents for these files and two data agents. For fun you might try other numbers of clusters.