Data Mining Model Nodes

Analysis Services

Analysis Services

Data Mining Model Nodes

When a data mining model is built and trained, the resulting data mining model content is stored as data mining model nodes. A node stores the attributes, description, probabilities, and distribution information for the model element it represents, as well as any cardinality information the node may possess in relation to other nodes.

Node Types

Each node has an associated node type that aids in representing a data mining model. The node types are used primarily for navigation, not as a way of defining functionality for the node. For example, although each node of a decision tree model may have a distribution associated with it, not all nodes in a decision tree model will be classified as distribution nodes. There are six types of currently supported nodes.

Model

A model node is the topmost node in any data mining model, regardless of the actual structure of the model. All models start with a model node.

Tree

For all tree-based models, this node serves as the root node of the tree. A data mining model may have many trees that make up the whole, but there is only one tree node from which all other nodes are related for each tree. A decision tree based model always has one model node and at least one tree node.

Interior

An interior node represents a generic interior node of a model. For example, in a decision tree, this node usually represents a split in the tree.

Distribution

A distribution node is guaranteed to have a valid link to a nested distribution table. A distribution node describes the distribution of values for one or more attributes according to the data represented by this node. A good example of a distribution node is the leaf node of a decision tree.

Cluster

A cluster node stores the attributes and data for the abstraction of a specific cluster. In other words, it stores the set of distributions that constitute a cluster of cases for the data mining model. A clustering based model always has one model node and at least one cluster node.

Unknown

The unknown node type is used when a node does not fit any of the other node types provided and the algorithm cannot resolve the node type.

The following diagram illustrates the differences between various node types and the algorithms that support them.

Browsing Data Mining Model Nodes

The nodes of a trained data mining model can provide valuable insight into the data. Nodes define the patterns and rules created by the analysis of the training data, and they can provide more information about new data predictions. The ability to browse a data mining model allows for refinement of the model with fine detail. Depending on the specific data mining algorithm used in the creation of the data mining model, the content type may vary on a model by model basis.

Data mining model content can be browsed in several ways.

Analysis Services

Analysis Manager provides Data Mining Model Browser, a useful tool for graphically exploring a data mining model and its content. For more information, see Data Mining Model Browser.

Rowset

Querying the model directly returns the data mining model content in the form of a single rowset.

SELECT * FROM <mining model>.CONTENT

The attributes and results for the nodes are stored in MINING_MODEL_CONTENT, a special schema rowset which allows for browsing of the data mining model content.

For more information about the storage schema, see Data Mining Model Storage.

XML

Another way to browse the content of a data mining model is as an Extensible Markup Language (XML) document. The XML information, however, is best viewed by a client application capable of parsing this complex data.

For more information about the document type definition (DTD) of the XML document, see the OLE DB for Data Mining specification.