This article describes how to use the values in multiple variables to create a new variable of categories.
The article contains the steps to go from data like this...
...creating a new variable with categories that appear in raw data form like: to new data categories like this:
And in table form like this:
Requirements
- A dataset with at least two variables imported into Displayr. To follow along with the example below, use this .sav file.
- Knowledge of how to construct conditions for the criteria for each category in R. See: How to Work with Conditional R Formulas.
Method - Using Indexing
When using indexing to create a new variable, the category criteria appear inside brackets that select which data meets the criteria. How this works is explained in How to Use Different Types of Data in R and more on the syntax needed is in How to Work with Data in R.
In the example below we will use criteria for age (Age) and living arrangements (d4) to categorize respondents into groups.
- On the Variables and Questions tab and right click on a row to insert your variable and select Insert Variable(s) > R Variable.
- In the bottom of the window, give it a Question Name, something like New Groups.
- In the R CODE paste in the following, note the comments (prefaced with a #) are there to explain what the code does so you can modify it for your own needs. Also of note, when setting up your conditions, you should use
==
to test against one value and use%in%
to test against a series of values, more info is here.:
####OPTIONAL - create variables for the different criteria
#flag respondents who fall into the following categories
young = Age %in% c("18 to 24", "25 to 29", "30 to 34")
single = d4 %in% c("Living alone", "Sharing accommodation", "Living with your parents/guardian")
partner.only = d4 == "Living with partner only"
children = d4 %in% c("Living with partner and children", "Living with children only")
####INDEX to assign new categories
#create empty data series using the length of a different variable
newcategories = rep(NA, length(Age))
#put the criteria in brackets to assign the new categories
newcategories[young & single] = "Young singles"
newcategories[!young & single] = "Older singles"
newcategories[young & partner.only] = "Young couples"
newcategories[!young & partner.only] = "Older couples"
newcategories[young & children] = "Young families"
newcategories[!young & children] = "Older families"
#return final results
newcategories - Click the blue play button. You'll see a preview of the input data (in grey) and final results (in blue) on the left:
- If everything looks as expected, click Add R Variable.
- Change the Question Type to Pick One (since we are creating a categorical variable). You will now have a new variable for the different groups, which you can use in your Report.
Do note, you can create optional criteria variables as defined above in the first section of the code or include the criteria directly inside the brackets. For example the following code:
newcategories[young & single] = "Young singles"
newcategories[Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 %in% c("Living alone", "Sharing accommodation", "Living with your parents/guardian")] = "Young singles"
Method - Using dplyr::case_when
The dplyr package contains a case_when() function that can also be used to assign categories to respondents based on criteria. It is a bit cleaner-looking and faster to work with since much of the repetitive syntax in indexing isn't required.
In the example below we will use criteria for age (Age) and living arrangements (d4) to recategorize respondents into groups.
- On the Variables and Questions tab and right click on a row to insert your variable and select Insert Variable(s) > R Variable.
- In the bottom of the window, give it a Question Name, something like New Groups.
- In the R CODE paste in the following code. Comments are prefaced with a #, and are there to explain what the code does so you can modify it for your own needs. When setting up your conditions, you should use
==
to test against one value and use%in%
to test against a series of values, more info is here. Also of note, withincase_when
we use~
rather than=
to assign the category:
####OPTIONAL - create variables for the different criteria
#flag respondents who fall into the following categories
young = Age %in% c("18 to 24", "25 to 29", "30 to 34")
single = d4 %in% c("Living alone", "Sharing accommodation", "Living with your parents/guardian")
partner.only = d4 == "Living with partner only"
children = d4 %in% c("Living with partner and children", "Living with children only")
####use dplyr::case_when to assign new categories
library(dplyr)
newcategories=case_when(
young & single ~ "Young singles",
!young & single ~ "Older singles",
young & partner.only ~ "Young couples",
!young & partner.only ~ "Older couples",
young & children ~ "Young families",
!young & children ~ "Older families",
!children & !partner.only & !single ~ "Other"
)
#return final results
newcategories - Click the blue play button. You'll see a preview of the input data (in grey) and final results (in blue) on the left.
- If everything looks as expected, click Add R Variable.
- Change the Question Type to Pick One (since we are creating a categorical variable). You will now have a new variable for the different groups, which you can use in your Report.
See Also
How to Work with Conditional R Formulas
How to Use R Code to Create a Filter Based on Single-Response Questions
How to Filter Raw Data Using R
How to Recode Data Based on a Lookup Using R