## Introduction

This article describes how to perform calculations on subgroups of data using R. A common use case would be if one needed to standardize a rating question based on the respondents' country due to cultural differences in how ratings are interpreted. In the example below we will walk through a hypothetical example of scaling Brand Attitude based on Age.

You will go from a table/variables where the average rating is different per age group:

To scaled variables resulting in an average of 0 for each age group:

## Requirements

- A Number, Number-Multi, or Number-Grid Question. You can access the data and code in the following example by downloading the QPack here.

## Method

The example below scales a brand attitude variable set within each age group in the data and creates a new scaled variable set. Although we are scaling the data in this example, you can perform any custom calculation you like by creating it within a function. You can also use built-in functions like mean(), sum(), etc directly in the dplyr code. To modify for your own document, please read through the comments (denoted with #) and edit the code where appropriate.

- Follow the instructions in How to Create a Custom R Variable and use the code the below in your
**R CODE**. Note you need to use this code the very first time you save the R variable to make sure all of the variables are saved out as expected.

###Identify variables to use in calculation

#the numeric variable set to standardize

myvars=`Brand attitude 2`

#select the variable to use for the grouping

groupvar=Age

###Format data to use in calculations

#combine the group and other variables you want to scale in a data.frame

thedata=data.frame(myvars,groupvar,check.names=F)

#remove the SUM column

thedata=thedata[,!colnames(myvars) == "SUM"]

#get list of columns to standardize

thecols=colnames(thedata)

#remove the grouping column from the list

thecols=thecols[thecols != "groupvar"]

###Create a function for your calculation, if needed

#create function to standardize using a formula

standardize <- function(x) (x-mean(x,na.rm=T))/sd(x,na.rm=T)

###Perform the calculations

#load dplyr functions used below

library(dplyr)

#group by the grouping variable and standardize the other variables

newvars <- thedata %>% #create newvars and %>% sends thedata to the function below

group_by(groupvar) %>% #group by the grouping variable in the data (comma separate if multiple)

mutate_at(thecols, standardize) #mutate_at adds a new column for each column in thecols to the results applying the standardize function

###Return the final result

#remove the groupvar from the final result

newvars[,colnames(newvars) != "groupvar"]

- If you are running into errors with your customized code, you can right click on the
**Report**tree and select**Add R Output**and paste the code there to troubleshoot. See How to Troubleshoot R Code.

If you make a table of your new Question, you'll see that the variables are scaled for each brand overall:

and within each age group:

## See Also

How to Perform Mathematical Calculations Using R

How to Quickly Make Data Long or Wide Using R

How to Automatically Stack a Data Set

How to Scale Respondents to have a Mean of 0 and a Standard Deviation of 1

## Comments

0 comments

Article is closed for comments.