A data file is “stacked” when a single respondent’s data appears as multiple cases (i.e., multiple rows in the Data tab). Most commonly, this is because the respondent has provided data about multiple occasions (where each occasion is in a separate variable) or about all the members of their household (where each household member is in a separate variable). In the stacked data set, each of those separate occasions are transformed into separate records.
Q can convert an un-stacked SPSS data file into a stacked SPSS data file, which can then be imported and analyzed in Q.
For example, if your original SPSS data file looks like this:
then you might stack your variables like this:
|Observation 1||Observation 2|
and then your output data file would look like:
A data file with multiple observations per case.
- If using Q's built-in tool: an unstacked SPSS data file
- If using R: an unstacked data file of any compatible type
Before stacking, you should always look at your data and work out:
- Which variables do you want to stack?
- Which variable is the ID variable?
- Are there other variables in your data you want to include but not stack? These will be stretched, which means that their values will be repeated several times for each of the original cases.
- Which variables do you want to exclude from the new data?
Method 1: Q's built-in stacking tool
Using Q's tool to stack only works with SPSS data files though any file can be converted to an SPSS data file by selecting Tools | Save Data as SPSS/CSV file. The basic process for stacking an SPSS data file is:
- Import the un-stacked SPSS data file into Q by starting a new Q project, then go to File > Data Sets > Add to Project > From File.
- When prompted, make sure to select Use original data file structure.
- Optional: In the Variables and Questions tab, order variables by dragging and dropping so that variables you want to stack are adjacent and in the right order. When done, use Tools > Save Data as SPSS/CSV File. Re-use (and modify) this file when new data is obtained for trackers.
- Select Tools and Stack SPSS .sav File….
- Drag and drop variable names until the file is as you desire. Each "loop" or iteration to stack should fall into a separate observation in the tool. This is discussed in more detail in the online training tutorial.
- Delete any variables that you do not need. This is done by dragging variable names into the Omit box on the left side of the dialog box. Stacked data files are generally much, much larger than unstacked data files because they repeat data for many variables. This can slow down Q and, in some cases, make it prone to crashing if your computer has insufficient memory. Consequently, the more variables that you can omit, the better.
- Important note: It's always good practice to include a respondent ID variable in your stacked data file.
- Set any missing values. For variables that cannot be stacked, Q will, by default, stack copies of a respondent’s values on top of each other. If you right-click and select Set as Missing, Q will replace all but the first observation for a respondent with missing values (
- Revise any variable names and labels, as required, using Override Name... and Override Label....
A worked example of the process of stacking a data file is available in the Online Training tutorial Stacking data.
Overview of the Stacking Tool's Interface
Use this dialog to convert the loaded SPSS file to a new stacked data file.
The new data file will contain two new variables:
- original_case will record the case number from your original data file.
- observation will be 1 for the first column of stacked variables, 2 for the second column, and so on.
Omit Variables dragged into this list will not be included in the output data. e.g. Variables not need for the stacked analysis.
Output file structure Drag variables beside each other to stack them. Start by selecting one or more variables and then click and drag the selection to the right of the variables you want to stack them with. Variables that are only present in the first column will be automatically repeated in subsequent columns.
- It's usually a good idea to include a respondent ID variable in the stacked data set (if available).
- Stacking can produce very large data files. Omit variables that are not required in your stacked analysis.
- Q will try to intelligently choose new variable names and labels for the stacked variables. Right click on the labels to override them.
- If you drag multiple groups of variables together then Q will try to arrange them side by side.
- If you do not want to automatically repeat a variable then right click on the cell you want to blank and select Set as Missing.
Method 2: Advanced and automated stacking using R
Sometimes the structure of the data file is too complex for this approach described above to work. Or, you need to repeat the stacking regularly, such as with tracking studies. In such cases, the stacking can be performed using R. The basic workflow for this is:
- File > Data Sets > Add to Project > From R
- Write code that imports the data file. For example, if it is an SPSS data file, using foreign::read.spss.
- Write code to restructure the data file, for example:
- The stack function (see R help on stack).
- The gather function in tidyr.
- The reshape function in dplyr.
- The melt function in data.table.
See stackoverflow for examples of most of these packages.
Once you have a stacked data file in your project, you can use it in conjunction with the unstacked data file by connecting the two data sets with a data file relationship. To learn more, start here: How to Set Up and Manage Data File Relationships