Building a Local Data Mining Model

Analysis Services Programming

previous page next page

Analysis Services Programming

Building a Local Data Mining Model

Conceptually, the structure of a local data mining model is similar to that of a table in a relational database. Like tables, data mining models are defined by the column definitions they contain. However, unlike a table in Microsoft® SQL Server™ 2000, the columns in a data mining model can contain nested tables. SQL Server 2000 Analysis Services supports two kinds of data mining models: models that are based on OLAP cubes, and models that are based on relational tables (or, more accurately, a rowset from an OLE DB provider).

The syntax for defining a mining model is also similar to that for defining a table. There are two different forms of the CREATE MINING MODEL statement, one for OLAP mining models and one for relational mining models.

Building a Model Based on an OLAP Cube

To create a mining model that is based on an OLAP cube, use the CREATE OLAP MINING MODEL statement. The general form of the statement is as follows:

CREATE OLAP MINING MODEL <Model Name> FROM <Case Cube Name> (<Cube Members>) USING <Algorithm Name>

The <Model Name> token specifies the name of the model that will be created. The physical location for this model will be the directory specified by the Mining Location property. If the Mining Location property is not specified in the connection string, the mining model created by this statement will have connection scope, and it will only exist for the duration of the session. The <Case Cube Name> token is the name of the cube that contains the training cases for the model <Cube Members>. Finally, the <Algorithm Name> token contains the name of the mining model algorithm that will be used to create the model. This token can have one of two values: Microsoft_Decision_Trees or Microsoft_Clustering.

The following example creates an OLAP mining model that predicts the Member Card Type property for members of the Customers dimension:

CREATE OLAP MINING MODEL [MyOlapModel] FROM [Sales]
  (
    CASE 
      DIMENSION [Customers] 
      LEVEL [Name] 
        PROPERTY [Marital Status], 
        PROPERTY [Education], 
        PROPERTY [Member Card Type] PREDICT
  )
USING Microsoft_Decision_Trees

The mining model that this example defines is based on the Sales cube in the current database (that is, the default database for this session). The three columns that will be included in this mining model are defined next. Each column is based on a member property that applies to each member contained in the Name level of the Customers dimension. The presence of the PREDICT specifier in the definition for the last column, Member Card Type, indicates that the column is predictable.

Building a Model Based on a Relational Database Table

You define relational mining models (that is, models that are based on tables in a relational database) by specifying the columns to be included in the model. Because the format and structure of the source data is not known in advance, each column is defined by a name, the data type of its content, its statistical nature, and whether the column will be predictable in a query. The general form of the statement that creates a relational mining model is as follows:

CREATE MINING MODEL <Model Name> (<Column Members>) USING <Algorithm Name>

For example, consider the following relational mining model definition:

CREATE MINING MODEL [MemberCards] 
  (
    [customer Id] LONG KEY , 
    [Yearly Income] TEXT DISCRETE , 
    [Member Card Type] TEXT DISCRETE PREDICT,
    [Marital Status] TEXT DISCRETE 
  ) 
USING Microsoft_Decision_Trees

In this example, a mining model named MemberCards is defined using the CREATE MINING MODEL statement. The syntax of this statement is similar to that of the CREATE TABLE statement in SQL. The columns that make up this mining model are named and their types are defined with additional information concerning the content they contain. The Member Card Type column is specified as being predictable by using the PREDICT specifier in its column definition.

Columns That Contain Nested Tables

You may want to create a mining model that contains a column with a nested table. In this case, use the TABLE type specifier in the CREATE MINING MODEL statement:

  CREATE MINING MODEL [Age Prediction]
  (
    [Customer ID]        LONG    KEY,
    [Gender]             TEXT    DISCRETE,
    [Age]                DOUBLE  DISCRETIZED() PREDICT,
    [Product Purchases]  TABLE
      (
        [Product Name]   TEXT    KEY,  
        [Product Type]   TEXT    DISCRETE RELATED TO [Product Name],
        [Quantity]       DOUBLE  NORMAL CONTINUOUS
      )
  )
  USING [Decision Trees]

In this example, the Product Purchases column contains a nested table that contains three columns: Product Name, Quantity, and Product Type. The first column in the nested table is a key column. The next column in the nested table, Product Type, is related to the Product Name column in a hierarchical relationship. The last column, Quantity, contains a floating-point number that is statistically normal and continuous across its domain (as opposed to having discrete values within the domain).

The last clause in the CREATE MINING MODEL states that the model should be built using the Microsoft Decision Trees data mining algorithm.

For more information about the CREATE MINING MODEL statement, see CREATE MINING MODEL Statement.

For more information, see the OLE DB for Data Mining specification.

Building a Local Data Mining Model

Analysis Services Programming