Introduction
Fits a neural network for classification or regression.
A random 30% of the data is used for cross-validation to find the optimal number of epochs according to cross-validation loss. The final network is trained on all data for the optimal number of epochs.
Requirements
A data set containing an outcome variable and predictor variables to use the predictive model.
This method is only available in Q5.
Method
Usage
To run Deep Learning:
1. Select Create > Classifier > Deep Learning.
2. Under Inputs > Deep Learning > Outcome select the outcome variable.
3. Under Inputs > Deep Learning > Predictors select the predictor variables.
4. Change any other settings as required.
Example
The Cross Validation output of a deep learning model. Loss is the quantity minimized by the model, which is mean squared error in the case of a numeric output. The model mean absolute error is also shown.
Options
Outcome - The variable to be predicted by the predictor variables. It may be either a numeric or categorical variable.
Predictors - The variable(s) to predict the Outcome.
Algorithm - The machine learning algorithm. Defaults to Deep Learning but may be changed to other machine learning methods.
Output
-
- Accuracy - When Outcome is categorical, produces a table of accuracy by class. Else calculates Root Mean Squared Error and R-squared if Outcome is numeric.
- Prediction-Accuracy Table - Produces a table relating the observed and predicted outcome. Also known as a confusion matrix.
- Cross Validation - Produces charts of loss (i.e. network error) and accuracy or mean absolute error vs training epoch.
- Network Layers - This returns a description of the layers of the network.
Missing data - See Missing Data Options.
Variable names - Displays Variable Names in the output.
Maximum epochs - The maximum number of epochs to train the network for. The actual number of epochs may be lower if the cross-validation error stops improving.
Hidden layers - A comma delimited list of the number of units in the hidden layers.
Normalize predictors - Whether the predictor variables are normalized to zero mean and unit variance. This is recommended if the variables differ significantly in their ranges. Note that categorical variables are also converted to dummy variables.
Random seed - Seed used to initialize the (pseudo)random number generator for the model fitting algorithm. Different seeds may lead to slightly different answers, but should normally not make a large difference.
Increase allowed output size - Check this box if you encounter a warning message "The R output had size XXX MB, exceeding the 128 MB limit..." and you need to reference the output elsewhere in your document; e.g., to save predicted values to a Data Set or examine diagnostics.
Maximum allowed size for output (MB) - This control only appears if Increase allowed output size is checked. Use it to set the maximum allowed size for the regression output in MegaBytes. The warning referred to above about the R output size will state the minimum size you need to increase to to return the full output. Note that having very many large outputs in one document or page may slow down the performance of your document and increase load times.
Weight - Where a weight has been set for the R Output, a new data set is generated via resampling, and this new data set is used in the estimation.
Filter - The data is automatically filtered using any filters prior to estimating the model.
DIAGNOSTICS
Prediction-Accuracy Table - Creates a table showing the observed and predicted values, as a heatmap.
SAVE VARIABLE(S)
Predicted Values - Creates a new variable containing predicted values for each case in the data.
Probabilities of Each Response - Creates new variables containing predicted probabilities of each response.
Acknowledgments
Uses the keras package, which uses TensorFlow.
More information
See this blog post for an introduction to deep learning.
Next