How to Standardize or Calculate Data within Subgroups in R

Introduction

This article describes how to perform calculations on subgroups of data using R. A common use case would be if one needed to standardize a rating question based on the respondents' country due to cultural differences in how ratings are interpreted. In the example below we will walk through a hypothetical example of scaling Brand Attitude based on Age.

You will go from a table/variables where the average rating is different per age group:

To scaled variables resulting in an average of 0 for each age group:

Requirements

A Number, Number-Multi, or Number-Grid Question. You can access the data and code in the following example by downloading the QPack here.

Method

The example below scales a brand attitude variable set within each age group in the data and creates a new scaled variable set. Although we are scaling the data in this example, you can perform any custom calculation you like by creating it within a function. You can also use built-in functions like mean(), sum(), etc directly in the dplyr code. To modify for your own document, please read through the comments (denoted with #) and edit the code where appropriate.

Follow the instructions in How to Create a Custom R Variable and use the code the below in your R CODE. Note you need to use this code the very first time you save the R variable to make sure all of the variables are saved out as expected.

###Identify variables to use in calculation
#the numeric variable set to standardize
myvars=`Brand attitude 2`

#select the variable to use for the grouping
groupvar=Age

###Format data to use in calculations
#combine the group and other variables you want to scale in a data.frame
thedata=data.frame(myvars,groupvar,check.names=F)

#remove the SUM column
thedata=thedata[,!colnames(myvars) == "SUM"] 

#get list of columns to standardize
thecols=colnames(thedata)
#remove the grouping column from the list
thecols=thecols[thecols != "groupvar"]


###Create a function for your calculation, if needed
#create function to standardize using a formula
standardize <- function(x) (x-mean(x,na.rm=T))/sd(x,na.rm=T)

###Perform the calculations
#load dplyr functions used below
library(dplyr)

#group by the grouping variable and standardize the other variables
newvars <- thedata %>% #create newvars and %>% sends thedata to the function below
           group_by(groupvar) %>% #group by the grouping variable in the data (comma separate if multiple)
           mutate_at(thecols, standardize) #mutate_at adds a new column for each column in thecols to the results applying the standardize function

###Return the final result
#remove the groupvar from the final result
newvars[,colnames(newvars) != "groupvar"]

If you are running into errors with your customized code, you can right click on the Report tree and select Add R Output and paste the code there to troubleshoot. See How to Troubleshoot R Code.

If you make a table of your new Question, you'll see that the variables are scaled for each brand overall:

and within each age group:

Articles in this section

Introduction

Requirements

Method

See Also

Articles in this section

Introduction

Requirements

Method

See Also

Related articles