This article provides a guide and template to help you achieve automatic stacking of your data set in Q using R. This process can save time if you need to stack data from the same study regularly (e.g., for a tracking study). If you only intend to stack your data once (i.e., it’s unlikely you will need to stack data periodically) then you are likely best off using the built-in tool here: How to Stack Data Files.
Data can be stacked when you import your data as an R Data Set. This requires you to write some R code to specify how the variables in the data set should be arranged to create the stacked data set. Below is an example of a template you can use in Excel to help build the stacking code. While the template can help, it is not a one-size-fits-all tool, as many configurations stacked data can take.
Requirements
- Data you need to stack on a regular basis in a .sav SPSS file.
- Moderate comfort with R code. To get started learning R, see: How to Learn R
Method
The process for adding a stacked data set to your project using R code is:
- Select File > Data Sets > Add to Project > From R.
- Enter the R Code which brings in the data and performs the stacking. Our example below is used to stack our example data file, Technology 2018.sav
- Modify part (1) of the code to specify the location of your file. This will either need to be a location on your computer or a publicly accessible URL - see the example Modify part (1) - Specifying the location of the unstacked data file.
- Modify part (2) of the code to refer to the variable name of the variable which contains your respondent IDs in the unstacked file - see the example Modify part (2) - The respondent ID variable.
- Modify part (3) of the code to specify sets of variables that are to be stacked - see Modify part (3) - Specifying stacked variables. Note, you will more than likely want to use our handy Excel template to generate the code of the list of variables to stack together - see Modify part (3) - Excel template for generating R Code.
- Enter a Name for the data set.
- Click the play icon.
- Click Add Data Set.
- See Additional Notes below for other things to consider after importing your stacked data set.
Your code window will look a bit like this.
Note that when developing the code for stacking your data, you may find it convenient to first run and test the code in an R Output.
Example Code
This example is from a study on technology brands. The example uses Technology 2018.sav. The purpose of this stacking is to stack the data for all of the brands so that there is a single NPS score, and a single set of image variables for a driver analysis.
#Review the numbered list of settings below
#Load the foreign package to use
library(foreign)
### (1) SPECIFY the file location -
#for files hosted on the web http link must end in .sav and automatically download the file
#if using local file change \ to \\ in the full filepath
location = "https://app.displayr.com/DataMart/File/6e8734bc-9963-44fe-8c9f-04c22efb3e83/Technology_2018.sav?company_id=751221"
thedata= suppressWarnings(read.spss(location,
use.value.labels = TRUE,
to.data.frame = TRUE))
###
### (2) specify your unique ID (Case) variable that can be used to link to the unstacked data
id.variable = 'RESPNUM'
###
### (3) list any other variables you want to include in the stacked dataset but leave unstacked
variables.to.stretch = c('Q1', 'Rec_Age')
###
### provide lists of variables to include in each stacked variable
variables.to.stack = list(
# (4) REPLACE the below with a copy/paste from BLUE (first) column of Excel template
# if you need to insert blank values then use the name 'dummy' in the list below
## !!!! be sure to remove the last comma before the closing round bracket !!!!
'Recommend' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05',
'Q3_06','Q3_07','Q3_08','Q3_09','Q3_10','Q3_11',
'Q3_12','Q3_13'),
'Fun' = c('Q4a_01','Q4a_02','Q4a_03','Q4a_04','Q4a_05','Q4a_06',
'Q4a_07','Q4a_08','Q4a_09','Q4a_10','Q4a_11','Q4a_12','Q4a_13'),
'Worth_what_you_pay_for' = c('Q4b_01','Q4b_02','Q4b_03','Q4b_04','Q4b_05','Q4b_06',
'Q4b_07','Q4b_08','Q4b_09','Q4b_10','Q4b_11','Q4b_12','Q4b_13'),
'Innovative' = c('Q4c_01','Q4c_02','Q4c_03','Q4c_04','Q4c_05','Q4c_06',
'Q4c_07','Q4c_08','Q4c_09','Q4c_10','Q4c_11','Q4c_12','Q4c_13'),
'Good_customer_service' = c('Q4d_01','Q4d_02','Q4d_03','Q4d_04','Q4d_05','Q4d_06',
'Q4d_07','Q4d_08','Q4d_09','Q4d_10','Q4d_11','Q4d_12','Q4d_13'),
'Stylish' = c('Q4e_01','Q4e_02','Q4e_03','Q4e_04','Q4e_05','Q4e_06',
'Q4e_07','Q4e_08','Q4e_09','Q4e_10','Q4e_11','Q4e_12','Q4e_13'),
'Easy_to_use' = c('Q4f_01','Q4f_02','Q4f_03','Q4f_04','Q4f_05','Q4f_06',
'Q4f_07','Q4f_08','Q4f_09','Q4f_10','Q4f_11','Q4f_12','Q4f_13'),
'High_quality' = c('Q4g_01','Q4g_02','Q4g_03','Q4g_04','Q4g_05','Q4g_06',
'Q4g_07','Q4g_08','Q4g_09','Q4g_10','Q4g_11','Q4g_12','Q4g_13'),
'High_performance' = c('Q4h_01','Q4h_02','Q4h_03','Q4h_04','Q4h_05','Q4h_06',
'Q4h_07','Q4h_08','Q4h_09','Q4h_10','Q4h_11','Q4h_12','Q4h_13'),
'Low_prices' = c('Q4i_01','Q4i_02','Q4i_03','Q4i_04','Q4i_05','Q4i_06',
'Q4i_07','Q4i_08','Q4i_09','Q4i_10','Q4i_11','Q4i_12','Q4i_13')
)
###
### get lists of variables to include/exclude in the stacked data file
#create dummy variable in the data set to use if needed
thedata$dummy=NA
#pull off the column headers from the data set to compare to the lists
all.names <- names(thedata)
#identify variables to exclude from the stacked data set
variables.to.exclude = all.names[!all.names %in% c(unlist(variables.to.stack), id.variable, variables.to.stretch)]
###
### Stack the data
stacked.data = reshape(data = thedata,
idvar = id.variable, direction = "long",
drop = variables.to.exclude,
varying = variables.to.stack)
###
### Format the stacked data
#make column names pretty
names(stacked.data) = c(id.variable, variables.to.stretch, "Observation", names(variables.to.stack))
###
### (5) return the final stacked data set
#keep category metadata
library(haven)
stacked.data = as_factor(stacked.data)
###
Modify part (1) - Specifying the location of the unstacked data file
In the example above, the function read.spss from the R package foreign is used to read in the unstacked data file prior to the stacking procedure. The location in the example is a URL - a location on the web. You can also set the location to be a path on your computer. For example, the following location will obtain a file from my Desktop:
location = "C:\\Users\\Chris\\Desktop\\Cola Tracking - January to September 2017.sav"
library(foreign)
datafile = suppressWarnings(read.spss(location, use.value.labels = FALSE, to.data.frame = TRUE))
Note the use of double backslashes "\\" to separate the folders on my system.
Other functions should be used in place of read.spss() if you are using a different format. For example, read.csv is the equivalent if using a CSV file.
Modify part (2) - The respondent ID variable
You need to specify a unique ID variable so that the stacked cases can be identified with the respondents in the original, unstacked file. This enables matching between the stacked and unstacked data, but it is also just good housekeeping.
Modify part (3) - Unstacked variables to include
If you need any other non-stacked variables in the stacked file, you can add their Names to the variables.to.stretch list. Note this list will not include your id.variable which will already be unstacked in the file. In the example above, we are including the first question and age question.
variables.to.stretch = c('Q1', 'Rec_Age')
If you don't need any further unstacked variables, you can leave this blank
variables.to.stretch = c()
Modify part (4) - Specifying stacked variables
The main part of the code is the lines following variables.to.stack, which specify the groups of variables which are to be stacked. You can use the following Excel file to assist in generating your R Code for part (4): Stacking Code Generator. Follow the instructions on the Instructions tab and paste the generated code in step (4) in the example code from above - being sure to remove the last comma.
For example,
'Q3_01' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05', 'Q3_06', 'Q3_07', 'Q3_08', 'Q3_09', 'Q3_10', 'Q3_11', 'Q3_12', 'Q3_13')
indicates that the variables with names Q3_01, Q3_02, to Q3_13 should be stacked in order to for a variable called Q3_01. In this example, these variables correspond to the Likelihood to recommend scores for the brands in our study.
Important notes:
- When pasting the code, be sure to remove the last comma before closing the round brackets.
- Be careful using Excel to copy, paste, delete and transpose data. All of the above requires some common-sense using Excel.
Additional Notes
Tidy the metadata
R data sets do not have the ability to include metadata, like variable labels and value labels. You will need to input this manually in the Variables and Questions tab for the stacked data set after adding it to Q. You only need to do this once - updating the data set later on will preserve the metadata you have specified. See How to Set Up Files With No Metadata for the typical workflow.
Data matching
This can be very handy if you ever want to cross-tabulate data between the unstacked datafile and the stacked data set. See How to Set Up and Manage Data File Relationships for more on creating relationships between data sets. You will create a One-to-Many relationship from the unstacked data set (one) to the stacked data set (many).
In the example data set, RESPNUM is used as the linking variable (ie: the Case ID variable). If you wanted to cross-tabulate between the stacked version of question 4 and the age of respondents (variable Rec_Age), having a One-to-Many datafile relationship will permit this (without needing to stack Rec_Age).
Updating
When the time comes to update your Q project with a newer version of the stacked data, you need to:
- Determine the file path or URL for your new unstacked file.
- Open your Q Project.
- Select File > Data Sets > Update and chose your stacked data set.
- (Optional) If the file path or URL has changed, modify your code to refer to the new file. Similarly, if there are new or different variables to stack, modify the code accordingly.
- Push the play button again. The new data set will be calculated, and when it is finished the new data will appear in the preview.
- Select Update Data Set. The old stacked data will now be replaced with the new version of the stacked data.
Next
How to Set Up and Manage Data File Relationships
How to Quickly Make Data Long or Wide Using R