How to Automatically Stack a Data Set Using R

This article provides a guide and template to help you achieve automatic stacking of your data set in Q using R. This process can save time if you need to stack data from the same study regularly (e.g., for a tracking study). If you only intend to stack your data once (i.e., it’s unlikely you will need to stack data periodically), then you are likely best off using the built-in tool. See How to Stack Data in Q.

Data can be stacked when you import your data as an R Data Set. This requires you to write some R code to specify how the variables in the data set should be arranged to create the stacked data set. Step 1 below contains an example of a template you can use in Excel to help build the stacking code. While the template can help, it is not a one-size-fits-all tool, as stacked data can take many configurations.

Requirements

Data you need to stack on a regular basis in a .sav SPSS file.
Moderate comfort with R code. To get started learning R, see: How to Learn R

Method

The process for adding a stacked data set to your project using R code is:

To start, you should generate the list of variables you want to stack. It's recommended that you use the Stacking Code Generator Excel tool to create the R code needed to list out your stacked variables - Stacking Code Generator. Follow the instructions on the Instructions tab and paste the generated code in step (4) in the example code from below - being sure to remove the last comma.
Select File > Data Sets > Add to Project > From R.
Enter the R Code, which brings in the data and performs the stacking. Our example below is used to stack our example data file, Technology 2018.sav.
Modify part (1) of the code to specify the location of your file. This will either need to be a location on your computer or a publicly accessible URL - see the example Modify part (1) - Specifying the location of the unstacked data file.
Modify part (2) of the code to refer to the variable name of the variable which contains your respondent IDs in the unstacked file - see the example Modify part (2) - The respondent ID variable.
Modify part (3) of the code to specify sets of variables that are to be stacked - see Modify part (3) - Specifying stacked variables. Note, you will likely want to use our handy Excel template to generate the code of the list of variables to stack together - see Modify part (3) - Excel template for generating R Code.
Enter a Name for the data set.
Click the play icon.
Click Add Data Set.
See Additional Notes below for other things to consider after importing your stacked data set.

Your code window will look a bit like this.

Note that when developing the code for stacking your data, you may find it convenient to first run and test the code in an R Output.

Example Code

This example is from a study on technology brands. The example uses Technology 2018.sav. The purpose of this stacking is to stack the data for all of the brands so that there is a single NPS score and a single set of image variables for a driver analysis.

#Review the numbered list of settings below
#Load the foreign package to use
library(foreign)

### (1) SPECIFY the file location - 
#for files hosted on the web http link must end in .sav and automatically download the file
#if using local file change \ to \\ in the full filepath
location = "https://app.displayr.com/DataMart/File/6e8734bc-9963-44fe-8c9f-04c22efb3e83/Technology_2018.sav?company_id=751221"
thedata= suppressWarnings(read.spss(location,
use.value.labels = TRUE,
to.data.frame = TRUE))
###

### (2) specify your unique ID (Case) variable that can be used to link to the unstacked data
id.variable = 'RESPNUM'
###

### (3) list any other variables you want to include in the stacked dataset but leave unstacked
variables.to.stretch = c('Q1', 'Rec_Age')
###

### provide lists of variables to include in each stacked variable
variables.to.stack = list(
# (4) REPLACE the below with a copy/paste from BLUE (first) column of Excel template
# if you need to insert blank values then use the name 'dummy' in the list below
## !!!! be sure to remove the last comma before the closing round bracket !!!! 
'Recommend' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05',
'Q3_06','Q3_07','Q3_08','Q3_09','Q3_10','Q3_11',
'Q3_12','Q3_13'),
'Fun' = c('Q4a_01','Q4a_02','Q4a_03','Q4a_04','Q4a_05','Q4a_06',
'Q4a_07','Q4a_08','Q4a_09','Q4a_10','Q4a_11','Q4a_12','Q4a_13'),
'Worth_what_you_pay_for' = c('Q4b_01','Q4b_02','Q4b_03','Q4b_04','Q4b_05','Q4b_06',
'Q4b_07','Q4b_08','Q4b_09','Q4b_10','Q4b_11','Q4b_12','Q4b_13'),
'Innovative' = c('Q4c_01','Q4c_02','Q4c_03','Q4c_04','Q4c_05','Q4c_06',
'Q4c_07','Q4c_08','Q4c_09','Q4c_10','Q4c_11','Q4c_12','Q4c_13'),
'Good_customer_service' = c('Q4d_01','Q4d_02','Q4d_03','Q4d_04','Q4d_05','Q4d_06',
'Q4d_07','Q4d_08','Q4d_09','Q4d_10','Q4d_11','Q4d_12','Q4d_13'),
'Stylish' = c('Q4e_01','Q4e_02','Q4e_03','Q4e_04','Q4e_05','Q4e_06',
'Q4e_07','Q4e_08','Q4e_09','Q4e_10','Q4e_11','Q4e_12','Q4e_13'),
'Easy_to_use' = c('Q4f_01','Q4f_02','Q4f_03','Q4f_04','Q4f_05','Q4f_06',
'Q4f_07','Q4f_08','Q4f_09','Q4f_10','Q4f_11','Q4f_12','Q4f_13'),
'High_quality' = c('Q4g_01','Q4g_02','Q4g_03','Q4g_04','Q4g_05','Q4g_06',
'Q4g_07','Q4g_08','Q4g_09','Q4g_10','Q4g_11','Q4g_12','Q4g_13'),
'High_performance' = c('Q4h_01','Q4h_02','Q4h_03','Q4h_04','Q4h_05','Q4h_06',
'Q4h_07','Q4h_08','Q4h_09','Q4h_10','Q4h_11','Q4h_12','Q4h_13'),
'Low_prices' = c('Q4i_01','Q4i_02','Q4i_03','Q4i_04','Q4i_05','Q4i_06',
'Q4i_07','Q4i_08','Q4i_09','Q4i_10','Q4i_11','Q4i_12','Q4i_13')
)
###

### get lists of variables to include/exclude in the stacked data file
#create dummy variable in the data set to use if needed
thedata$dummy=NA
#pull off the column headers from the data set to compare to the lists
all.names <- names(thedata)
#identify variables to exclude from the stacked data set
variables.to.exclude = all.names[!all.names %in% c(unlist(variables.to.stack), id.variable, variables.to.stretch)]
###

#####
##check that all names input are in the data file
missingid=setdiff(id.variable,all.names)
missingstretch=setdiff(variables.to.stretch,all.names)
missingstack=setdiff(unlist(variables.to.stack),all.names)

##if any are missing, error the output and list out what is missing and what
names are used in the file
if(length(missingid) > 0 ) stop(paste0("Please check your id.variable name\"",id.variable,"\"It
  was not found in the data file. Available variable names are:\"",paste(all.names,collapse=", ")))
if(length(missingstretch) > 0 ) stop(paste0("Please check your variables.to.stretch,
  the following could not be found in the data set:\"",paste(missingstretch,collapse=",
  "),"\"\" Available variable names are:\"",paste(all.names,collapse=", ")))
if(length(missingstack) > 0 ) stop(paste0("Please check your variables.to.stack
  list, the following could not be found in the data set:\"",paste(missingstretch,collapse=",
  "),"\"\" Available variable names are:\"",paste(all.names,collapse=", ")))

##check that all the stacked variables are stacking the same number of variables
lengths=sapply(variables.to.stack,length)
if(any(lengths != lengths[1])) stop(paste("You must be stacking the same number
  of variables for each stacked variable. The number of variables stacked for each
  stacked variable is below, please fix the variables.to.stack list so that all
  numbers are the same. You can use the name dummy in the code to fill in for missing
  variables.\"",paste(paste0(names(lengths),": ",lengths),collapse="\"")))
#####

#put columns in data order as their lists above so variable lists can be in any order
thedata=thedata[,c(id.variable,variables.to.stretch,setdiff(colnames(thedata),c(id.variable, variables.to.stretch)))]

### Stack the data
stacked.data = reshape(data = thedata,
idvar = id.variable, direction = "long",
drop = variables.to.exclude,
varying = variables.to.stack)
###

### Format the stacked data
#make column names pretty
names(stacked.data) = c(id.variable, variables.to.stretch, "Observation", names(variables.to.stack))
###

### (5) return the final stacked data set
#keep category metadata
library(haven)
stacked.data = as_factor(stacked.data)
###

Modify part (1) - Specifying the location of the unstacked data file

In the example above, the function read.spss from the R package foreign is used to read in the unstacked data file prior to the stacking procedure. The location in the example is a URL - a location on the web. You can also set the location to be a path on your computer. For example, the following location will obtain a file from my Desktop:

location = "C:\\Users\\Chris\\Desktop\\Cola Tracking - January to September 2017.sav"

library(foreign)
datafile = suppressWarnings(read.spss(location, use.value.labels = FALSE, to.data.frame = TRUE))

Note the use of double backslashes "\\" to separate the folders on my system.

Other functions should be used in place of read.spss() if you are using a different format. For example, read.csv is the equivalent if using a CSV file.

Modify part (2) - The respondent ID variable

You need to specify a unique ID variable so that the stacked cases can be identified with the respondents in the original, unstacked file. This enables matching between the stacked and unstacked data, but it is also just good housekeeping.

Modify part (3) - Unstacked variables to include

If you need any other non-stacked variables in the stacked file, you can add their Names to the variables.to.stretch list. Note this list will not include your id.variable, which will already be unstacked in the file. In the example above, we are including the first question and the age question.

variables.to.stretch = c('Q1', 'Rec_Age')

If you don't need any further unstacked variables, you can leave this blank

variables.to.stretch = c()

Modify part (4) - Specifying stacked variables

The main part of the code is the lines following variables.to.stack, which specify the groups of variables that are to be stacked. You can use the following Excel file to assist in generating your R Code for part (4): Stacking Code Generator. Follow the instructions on the Instructions tab and paste the generated code in step (4) in the example code from above - being sure to remove the last comma.

For example,

'Q3_01' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05', 'Q3_06', 'Q3_07', 'Q3_08', 'Q3_09', 'Q3_10', 'Q3_11', 'Q3_12', 'Q3_13')

indicates that the variables with names Q3_01, Q3_02, to Q3_13 should be stacked in order to for a variable called Q3_01. In this example, these variables correspond to the Likelihood to recommend scores for the brands in our study.

Important notes:

When pasting the code, be sure to remove the last comma before closing the round brackets.
Be careful using Excel to copy, paste, delete, and transpose data. All of the above require some common sense using Excel.

Additional Notes

Tidy the metadata

R data sets do not have the ability to include metadata, like variable labels and value labels. You will need to input this manually in the Variables and Questions tab for the stacked data set after adding it to Q. You only need to do this once - updating the data set later on will preserve the metadata you have specified. See How to Set Up Files With No Metadata for the typical workflow.

Data matching

This can be very handy if you ever want to cross-tabulate data between the unstacked data file and the stacked data set. See How to Set Up and Manage Data File Relationships for more on creating relationships between data sets. You will create a One-to-Many relationship from the unstacked data set (one) to the stacked data set (many).

In the example data set, RESPNUM is used as the linking variable (i.e., the Case ID variable). If you wanted to cross-tabulate between the stacked version of question 4 and the age of respondents (variable Rec_Age), having a One-to-Many datafile relationship will permit this (without needing to stack Rec_Age).

Updating

When the time comes to update your Q project with a newer version of the stacked data, you need to:

Determine the file path or URL for your new unstacked file.
Open your Q Project.
Select File > Data Sets > Update and choose your stacked data set.
(Optional) If the file path or URL has changed, modify your code to refer to the new file. Similarly, if there are new or different variables to stack, modify the code accordingly.
Push the play button again. The new data set will be calculated, and when it is finished, the new data will appear in the preview.
Select Update Data Set. The old stacked data will now be replaced with the new version of the stacked data.

How to Stack Data in Q

How to Set Up and Manage Data File Relationships

How to Use R to Add Data Sets

How to Quickly Make Data Long or Wide Using R

How to Unstack Data Using R

How to Learn R