K-Means least-squares partitioning method

Updated July 27th, 2001

Pierre Legendre
July 2001
Département de Sciences Biologiques
Université de Montréal
K-means is a least-squares partitioning method allowing users to divide a collection of objects into K groups. The theory is presented in all textbooks of numerical classification methods, including section 8.8 of Legendre and Legendre (1998). The program implements a simple alternating least-squares algorithm, the same one as used in the SAS FASTCLUS procedure. The algorithm iterates between two simple steps: Among the 20 or so algorithms that have been described for K-means partitioning, this one is known to be the fastest; it also has good convergence properties. In the course of the iterations, the program tries to minimise the sum, over all groups, of the squared within-group residuals, which are the distances of the objects to the respective group centroids. Convergence is reached when the objective function (i.e., the residual sum-of-squares) cannot be lowered any more. The groups obtained are such that they are geometrically as compact as possible around their respective centroids. Since K-means partitioning is one of the so-called “NP-hard problems”, no algorithm can guarantee that the absolute minimum of the objective function has been reached; any given run of K-means partitioning may end up in what is called a “local minimum” of the objective function. A strategy, described in the User's guide, helps users find a value of the objective function which is likely to represent the overall minimum. New in July 2001 is the ability to specify an initial list of centroids to start the K-Means partition with.

Program availability

References:
Legendre, P. & Legendre, L. 1998. Numerical Ecology, 2nd English edition. Elsevier Science BV, Amsterdam. xv + 853 pages.