Create an ensemble of multiple Machine Learning and/or Regression models. The models may be either existing already, or created for the ensemble.
If the outcome being predicted is numeric, the ensemble predicts the average model prediction of each case. If the outcome is categorical, the ensemble calculates the average probability of each class for each case, and predicts the greatest probability. Metrics are computed based on each model's training data. Optionally a filter specifying evaluation data (usually a testing sample independent of the training sample) may also be provided.
The following functions can be used to add predictions to a data set:
- Machine Learning > Save Variable(s) > Predicted Values
- Machine Learning > Save Variable(s) > Probabilities of Each Response
- A Q project with multiple machine learning model outputs from which you want to create the ensemble.
To run a Machine Learning Ensemble:
1. Select Create > Classifier > Ensemble of Models.
2. Under Inputs > EXISTING MODELS > Input models select your existing models that you would like to analyze.
3. Under Inputs > EXISTING MODELS > Output select the type of output you would like to see, e.g. a Comparison (first example below), or an Ensemble for the Prediction-Accuracy Table (second example below).
Comparison table for 3 models:
Prediction-accuracy table for the ensemble:
Existing or new models - Choose to use existing machine learning models or create new models to compare.
Ensemble - Check the Ensemble checkbox to create an ensemble model by combining the predictions of the underlying models.
Optimal ensemble - Whether to find the ensemble with the best evaluation accuracy or R-squared (or training accuracy or R-squared if no evaluation filter is supplied).
- Comparison - A table comparing metrics for the models (and the ensemble(s), if selected).
- Ensemble - A Prediction-Accuracy Table for the ensemble (Optimal ensemble if selected) using the training data.
Input models - At least 2 existing machine learning models.
Outcome - The variable to be predicted by the predictor variables.
Predictors - The variable(s) to predict the Outcome.
Missing data - See Missing Data Options.
Variable names - Displays variable names in the output instead of labels.
Random seed - Initializes the random number generator for imputation and algorithms with randomness.
Evaluation filter - Select a filter to apply to the models.
Models - For each model, select a machine learning algorithm and the desired settings for each model. See for more details.
For model-specific options see: Classification And Regression Trees (CART), Linear Discriminant Analysis, Random Forest, Support Vector Machine, Deep Learning, Gradient Boosting or Regression.
Prediction-Accuracy Table - Creates a table showing the observed and predicted values, as a heatmap.
Predicted Values - Creates a new variable containing predicted values for each case in the data.
Probabilities of Each Response - Creates new variables containing predicted probabilities of each response.