Microsoft Clustering

Analysis Services

Analysis Services

Microsoft Clustering

The Microsoft® Clustering algorithm is an expectation method that uses iterative refinement techniques to group records into neighborhoods (clusters) that exhibit similar, predictable characteristics. Often, these characteristics may be hidden or nonintuitive. For example, suppose that a travel firm wants to determine age demographics for marketing vacation packages. From their data warehouse they have the following training data.

Customer age Country traveled to
23 Mexico
45 Canada
32 Canada
47 Canada
46 Canada
34 Canada
51 Canada
28 Mexico
49 Canada
29 Mexico
26 Mexico
31 Canada

When this information is plotted on a graph with two dimensions, you can see that there are three main groups in the data: People between the ages of 23 and 29 seem to travel to Mexico. People between the ages of 30 and 51 seem to travel to Canada. The clustering algorithm also presents an interesting fact that might not be apparent from observing the data directly: People between the ages of 35 and 44 did not seem to travel at all. Another way of saying this is that the grouping of people who travel to Canada falls into two main clusters: People between the ages of 30 and 34, and people between the ages of 45 and 51.

The clusters of data in this example are readily observed. For data with higher dimensions, plotting the data in this manner may not be convenient, or the dimensions may not be amenable to plotting at all. The clustering algorithms automatically find such groupings in data with higher numbers of dimensions.

Mining Parameters

The Microsoft Clustering algorithm provider does not currently support any additional mining parameters.