Training Data Mining Models
In order for data mining models to provide predictive results, they first must work with known data in a process known as training. During this process, data is inserted into the untrained data mining model. The process of inserting the data does not save the training data into the data mining model; rather, the data mining model analyzes the training data, looking for rules and patterns that can be used later to determine the histogram values for predictive columns, and then it stores the statistical information as data mining model content.
Training is done by processing the data mining model in the Mining Model Wizard, in the mining model editors, and from Analysis Manager.
The training process is similar for both relational and OLAP mining models. In the Mining Model Wizard, the source tables or cube used to construct the model are assumed to contain training data and are used to supply the data mining model. In the Mining Model Editor, the case and association tables for a relational data mining model expected to supply the training data are displayed as part of the model. This is not so with the OLAP mining model, because all of the dimensions, levels, and measures of the source cube are duplicated as part of the structure of the OLAP mining model, even if they are not being employed as an active part of the mining model. A relational data mining model incorporates only those data mining columns that will be used by the mining model into its structure.
Processing the data mining model can be performed in one of the following ways.
Refresh
Clears the data mining model content and retrains the model from the training data. It is best used when the model structure has not changed, but the model needs to be completely retrained from a new set of training data. End users can continue to query a data mining model during a refresh process; after the refresh process completes, the users have access to the refreshed data without having to reconnect.
Full Process
Completely removes and rebuilds the data mining model and trains the newly constructed model from the training data. It is required for data mining models whose structure has changed and for models that have not yet been trained. End users working with the data mining model must reconnect to the server after this process completes in order to continue to work with it.
After the mining model is processed, the information about the patterns and rules discovered in the training data are stored as data mining model content, along with the distribution information of the case data as well.