Sometimes you may wish to create segments of your respondents and use these segments to classify respondents in a different survey or later wave of your tracker. You can, in essence, reuse your original segmentation model to classify respondents in new data to those segments. There are two different ways to approach this:
- Assign respondents to segments in the new data file using the same variables as used when forming the segments, or,
- Predict segment membership based on a different set of variables.
Requirements
- A document with one of the following types of segmentation models:
- Latent Class Analysis
- Trees
- k-means
- Mixture Models for Regression
- Most machine learning models, such as Random Forest
- A new data set with variables that correspond (whether they are named the same or not) to the original variables used in the predictive model that will predict using the new data.
- Each variable in the new data must have the same code frame as the original data. All the same categories are present, and their underlying coded values are the same (i.e., Category 1 is given a value of 1 in the original data and a value of 1 in the new data).
- If using a non-regression type of algorithm (trees, kmeans, etc), there are additional requirements:
- Because there's always a random element in the algorithm, the order of the categories of the variables used must be exactly the same. That is, when you make a summary table for each variable, the placement of the rows needs to be exactly the same as they were when the tree was created.
- There must be at least 1 respondent in each category that had at least 1 respondent before, and 0 respondents in each category that had 0 respondents before. That is, if you use a question with a "Don't know" category, for example, where there was 1 respondent who selected it in the original data set, you also need to have at least 1 respondent for that category in the new data set. Conversely, if in the original data set you have a category with 0 respondents, this must remain as 0 respondents in the new data set.
Method 1 - Using the same variables
Segments formed using latent class analysis or built-in segmentation modeling
A three-segment latent class solution, based on a sample size of 400, is shown below. To allocate people in a new data file using these segments. Note other important requirements above.:
- File > Data Sets > Update and select the data file used to create the analysis.
- Choose the new data file and press OK.
- Click on the latent class output, which will be shown as having an error, and press Ignore. The segmentation variable that is created in your data file will now be applying the previous segmentation to the additional data. You should not click Ignore if you have reordered the input data; you should instead either regrow the tree or revert the data to its original order.
- The variable in the project that shows segment membership has now automatically updated, allocating people in the new data file to the segments.
Segments formed using k-means or other R-based segmentation modeling
A three-cluster k-means solution is shown above. To allocate people in a new data file using these segments:
1. Click on the k-means output and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector, on the right of the screen).
2. Take a copy of line 2 of the code. In my example, it looks like this:
kmeans = KMeans (data.frame (understand, shop, key, value, interested),
3. Go to the Variables and Questions tab.
4. Right-click on the first variable and select Insert Variable(s) > R Variable
5. Paste in the copied code, and modify it so that it looks like the following code. The key bits to retain from your pasted code are kmeans or whatever name you assigned to it, and the variable names (note you can use different variable names as long as they correspond and are coded as your original variable names in the new data).
predict (kmeans, newdata = data.frame (understand, shop, key, value, interested))
6. Press the play button (the triangle).
7. Insert the Question name.
8. Press Add R Variable.
9. Change the Question Type of the newly created variable to Pick One.
10. Press the Values button and enter any labels you desire, and press OK.
Method 2 - Using different variables
Many times, you may not have all the same questions used to segment respondents in a survey that you want to segment. In this case, you can use a predictive model to predict segment membership (after it's created using one of the original segmentation models). Instead of including all of the original variables used to create the segmentation as predictors, you can either include:
- A completely different set of variables (e.g., demographics, or some other data available in a customer database).
- A subset of the variables used to create the segments. (Tip: if you are building a predictive model based on exactly the same variables as used to create segments, you are making a mistake, and should instead use the approach described in the previous section).
The output above from a multinomial logit (MNL) model (In Q5: Create > Regression > Multinomial Logit), predicting segment membership based on firmographics. The goal is to now predict segment membership in a new data file that contains the same predictor variables.
1. Click on the model output and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector).
2. Take a copy of the line of code that looks similar to this (with different variable names):
glm = Regression (segmentsGXVYHS ~ q1 + q2 + q3 + q4 + q5
3. Go to the Variables and Questions tab.
4. Right-click on the first variable and select Insert Variable(s) > R Variable
5. Paste in the copied code, and modify it so that it looks like the following code. The key bits to retain from your pasted code are glm or whatever it has been changed to and the variable names.
predict (glm, newdata = data.frame (q1, q21, q3, q4, q5))
6. Press the play button (the triangle).
7. Insert the Question name
8. Press Add R Variable
9. Change the Question Type of the newly created variable to Pick One.
10. Press the Values button and enter any labels you desire, and press OK.