July 2015 Archives

K Means Business

Last time we took a first brief look at cluster analysis in which we seek to algorithmically differentiate sets of data into subsets of similar data. The difficulty in doing so stems from the fact that similarity is a rather poorly defined concept and so clustering algorithms typically proceed by updating cluster memberships according to some rule of thumb, or heuristic, that is designed to reflect some intuitive notion of similarity. As a consequence, clusters have the rather unusually circular definition of that which are identified by a clustering algorithm!

Full text...

submit to reddit