CREATE MINING MODEL Statement

Analysis Services Programming

Analysis Services Programming

CREATE MINING MODEL Statement

This statement creates a local data mining model on the client computer. You can create mining models from relational databases, PMML, or OLAP cubes.

BNF (CREATE MINING MODEL)

<dm_create>::=CREATE MINING MODEL <identifier> ( <col_def_list> ) USING <algorithm> [(<algo_param_list>)]

<pmml_create>::= CREATE MINING MODEL <identifier> FROM PMML <string>

<select_into>::= SELECT * INTO <identifier> USING <algorithm> FROM <identifier>

<col_def_list>::= <col_def> |<col_def_list> , <col_def>
<col_def>::= <col_def_reg> | <col_def_tbl>
<col_def_reg>::= <identifier> <col_type> [<col_distribution>] [<col_binary>] [<col_content>] [<col_content_qual>] [<col_qualif>] [<col_prediction>] [<relation_clause>]

<col_def_tbl> ::= <identifier> TABLE <col_prediction> ( <col_def_list> )

<algorithm> ::= MICROSOFT_DECISION_TREES | MICROSOFT_CLUSTERING

<algo_param>::= <identifier> = <value>

<algo_param_list>::=<algo_param>

     | <algo_param>, <algo_param_list>

<col_type>::= LONG
         | BOOLEAN
         | TEXT
         | DOUBLE
        
| DATE

<col_distribution>-> NORMAL
        | UNIFORM

<col_binary>::= MODEL_EXISTENCE_ONLY
         | NOT NULL

<col_content>::= DISCRETE
         | CONTINUOUS
         | DISCRETIZED( [<disc_method> [, <numeric_const>]] )
         | SEQUENCE_TIME

<disc_method>::= AUTOMATIC
         | EQUAL_AREAS
         | THRESHOLDS
         | CLUSTERS

<col_content_qual>-> ORDERED
         | CYCLICAL

<col_qualif>::= KEY
         | PROBABILITY
         | VARIANCE
         | STDEV
         | STDDEV
         | PROBABILITY_VARIANCE
         | PROBABILITY_STDEV
         | PROBABILITY_STDDEV
         | SUPPORT

<col_prediction>    -> PREDICT
         | PREDICT_ONLY

<relation_clause>    -> <related_to_clause>
         | <of_clause>

<related_to_clause>-> RELATED TO <identifier>
         | RELATED TO KEY

<of_clause>::= OF <identifier>
         | OF KEY

BNF (CREATE OLAP MINING MODEL)

Use this syntax to create mining models that are based on OLAP cubes instead of on relational database tables. Each OLAP mining model contains one or more case dimensions and zero or more case measures. Columns within each case can be based on any object in the Dimension object model, such as a hierarchy, level, or property, or can be based upon the value of a measure. The flags that are used with each OLAP mining model column are the same as those used for relational mining models. OLAP mining models are trained in the same manner as relational mining models, using the same syntax.

<olap create statement> ::= CREATE OLAP MINING MODEL <dmm name>
            FROM <cube name> <olap definition>
            USING <dmm algorithm> [(dmm flag list)]

<olap definition> ::= CASE <olap dimension> [, <olap dimension list>] [, <olap measure list>]

<olap dimension list> ::= <olap dimension> [, <olap dimension list>]

<olap dimension> ::= DIMENSION <dimension name> <predict qualifier>
                        { <olap level list> | <olap hierarchy list> }

<olap hierarchy list> ::= <olap hierarchy>
                        [, <olap hierarchy list>]

<olap hierarchy> ::= HIERARCHY <hierarchy name> <predict qualifier> <olap level list>

<olap level list> ::= <olap level> [, <olap level list>]

<olap level> ::= LEVEL <level name> <predict qualifier> <olap property list>

<olap property list> ::= <olap property> [, <olap property list>]

<olap property> ::= PROPERTY <property name> <predict qualifier>

<olap measure list> ::= <olap measure> [, <olap measure list>]

<olap measure> ::= MEASURE <measure name> <predict qualifier>

<predict qualifier> = <nothing> | PREDICT | PREDICT_ONLY

<dmm flag list> ::= <dmm flag> [, <dmm flag list>]

<dmm flag> ::= <flag name> = <value>

<flag Name> ::= <col_type> [<col_distribution>] [<col_binary>] [<col_content>] [<col_content_qual>] [<col_qualif>]

Remarks

The CREATE MINING MODEL statement creates a new mining model based on the column definition list. Each column is described by content flags in the column definition. These flags provide additional information to the mining algorithm concerning the content of the training data or model. No more than one flag from a particular group can be used (that is, flags within a flag type group are exclusive of each other) and they must be placed in their correct order. The flag type groups and correct orders for the content flags are listed in the following table.

Flag type Flag name Description
Distribution NORMAL The values of the column appear in a normal distribution.
  LOG NORMAL The values of the column appear in a log normal distribution.
  UNIFORM The values of the column appear in a uniform distribution.
Content Type KEY The column is discrete and is a key. Key columns will not have any other flags except in the case of a nested table with no attribute columns.
  CONTINUOUS The column contains values in a continuous range, such as Age or Salary.
  DISCRETE The column contains a discrete set of values, such as Gender.
  DISCRETIZED()  The column contains a continuous set of values that should be converted to buckets.
  ORDERED The column contains a discrete set of values that are ordered, such as Salary Level.
  CYCLICAL The column contains an ordered discrete set of values that are cyclical, such as Day of Week or Month.
  SEQUENCE TIME The column contains time measurement units.
Modeling MODEL_EXISTENCE_ONLY The column should be modeled as having two states, missing and nonmissing, regardless of the values in the column. This is particularly useful for columns in a nested table, where values are sparse across cases.
  NOT NULL The column cannot accept NULL values.
Special Property PROBABILITY The value in this column is the probability (0-1) of the associated value.
  VARIANCE The value in this column is value variance of the associated value.
  STD The value in this column is the standard deviation of the associated value.
  PROBABILITY VARIANCE The value in this column is the variance of the probability associated with the associated value.
  PROBABILITY STD The value in this column is the standard deviation of the probability associated with the associated value.
  SUPPORT The value in this column is the weight (case replication factor) of the associated value.

Column relations are described in one of the following ways.

<Column relation> clause Description
OF This form is restricted to use for columns with Special Property content flags, for example, ProbGender Double PROBABILITY OF Gender.
RELATED TO This form indicates a value hierarchy. The target of a related to column can be a key column in a nested table, a discretely valued column on the case row, or another column with a RELATED TO clause (indicating a deeper hierarchy).

The following flags are used to describe how a prediction column functions.

<Prediction flag> clause Description
PREDICT This column can be predicted by the model and it can be supplied in input cases to predict the value of other predictable columns.
PREDICT_ONLY This column can be predicted by the model, but its values cannot be used in input cases to predict the value of other predictable columns.

See Also

Building a Data Mining Model