Introduction
This article describes how to perform calculations on subgroups of data using R. A common use case would be if one needed to standardize a rating question based on the respondents' country due to cultural differences in how ratings are interpreted. In the example below we will walk through a hypothetical example of scaling Brand Attitude based on Age.
You will go from a table/variables where the average rating is different per age group:
To scaled variables resulting in an average of 0 for each age group:
Requirements
- A Number, Number-Multi, or Number-Grid Question. You can access the data and code in the following example by downloading the QPack here.
Method
The example below scales a brand attitude variable set within each age group in the data and creates a new scaled variable set. Although we are scaling the data in this example, you can perform any custom calculation you like by creating it within a function. You can also use built-in functions like mean(), sum(), etc directly in the dplyr code. To modify for your own document, please read through the comments (denoted with #) and edit the code where appropriate.
- Follow the instructions in How to Create a Custom R Variable and use the code the below in your R CODE. Note you need to use this code the very first time you save the R variable to make sure all of the variables are saved out as expected.
###Identify variables to use in calculation
#the numeric variable set to standardize
myvars=`Brand attitude 2`
#select the variable to use for the grouping
groupvar=Age
###Format data to use in calculations
#combine the group and other variables you want to scale in a data.frame
thedata=data.frame(myvars,groupvar,check.names=F)
#remove the SUM column
thedata=thedata[,!colnames(myvars) == "SUM"]
#get list of columns to standardize
thecols=colnames(thedata)
#remove the grouping column from the list
thecols=thecols[thecols != "groupvar"]
###Create a function for your calculation, if needed
#create function to standardize using a formula
standardize <- function(x) (x-mean(x,na.rm=T))/sd(x,na.rm=T)
###Perform the calculations
#load dplyr functions used below
library(dplyr)
#group by the grouping variable and standardize the other variables
newvars <- thedata %>% #create newvars and %>% sends thedata to the function below
group_by(groupvar) %>% #group by the grouping variable in the data (comma separate if multiple)
mutate_at(thecols, standardize) #mutate_at adds a new column for each column in thecols to the results applying the standardize function
###Return the final result
#remove the groupvar from the final result
newvars[,colnames(newvars) != "groupvar"]
- If you are running into errors with your customized code, you can right click on the Report tree and select Add R Output and paste the code there to troubleshoot. See How to Troubleshoot R Code.
If you make a table of your new Question, you'll see that the variables are scaled for each brand overall:
and within each age group:
See Also
How to Perform Mathematical Calculations Using R
How to Quickly Make Data Long or Wide Using R
How to Automatically Stack a Data Set
How to Scale Respondents to have a Mean of 0 and a Standard Deviation of 1