Microsoft Clustering

Analysis Services

previous page next page

Analysis Services

Microsoft Clustering

The Microsoft® Clustering algorithm is an expectation method that uses iterative refinement techniques to group records into neighborhoods (clusters) that exhibit similar, predictable characteristics. Often, these characteristics may be hidden or nonintuitive. For example, suppose that a travel firm wants to determine age demographics for marketing vacation packages. From their data warehouse they have the following training data.

Customer age	Country traveled to
23	Mexico
45	Canada
32	Canada
47	Canada
46	Canada
34	Canada
51	Canada
28	Mexico
49	Canada
29	Mexico
26	Mexico
31	Canada

When this information is plotted on a graph with two dimensions, you can see that there are three main groups in the data: People between the ages of 23 and 29 seem to travel to Mexico. People between the ages of 30 and 51 seem to travel to Canada. The clustering algorithm also presents an interesting fact that might not be apparent from observing the data directly: People between the ages of 35 and 44 did not seem to travel at all. Another way of saying this is that the grouping of people who travel to Canada falls into two main clusters: People between the ages of 30 and 34, and people between the ages of 45 and 51.

The clusters of data in this example are readily observed. For data with higher dimensions, plotting the data in this manner may not be convenient, or the dimensions may not be amenable to plotting at all. The clustering algorithms automatically find such groupings in data with higher numbers of dimensions.

Mining Parameters

The Microsoft Clustering algorithm provider does not currently support any additional mining parameters.

previous page start next page