Introduction
This article will tell you how to re-format data that consists of comma-separated numerical values, but stored in a single string. In Q this will be imported as a Text variable by default.
It will take you from this state:
To this state:
When multiple response data has been stored in a comma-separated string, often as part of a CSV file, it needs to be split into individual variables storing whether the response has been selected or not before it can be used in Q.
Requirements
A single variable where the comma separated values indicate if a respondent selected a specific choice in a multi-select question
Method
1. Go to Create > Variables and Questions > Variables > R Variable to open the R variable dialogue.
2. In the Variable Base Name (bottom of the screen) enter a base name for your variables, e.g. "Q1" or similar.
3. In the Question Name field, enter the question text you want the new question to have.
4. In the R CODE field, paste in the code below.
## Get the source variable
x = `data`
## Get the numerical values (and strip out leading spaces if any)
x = lapply(x, function(x) sub("^\\s+", "", strsplit(x, ",")[[1]]))
## Get the number of cases
n = length(x)
## Get all unique values in data
uniques = sort(unique(unlist(x)))
## Calculate number of unique values
k = length(uniques)
## Dimension a matrix that is the length of the number of cases and the width of hte number of unique values
out = matrix(FALSE, n, k, dimnames = list(NULL, uniques))
## Loop over each case and split out the responses into the columns
for (i in 1:n)
{
resp.data = x[[i]]
out[i, ] = if (is.null(resp.data)) NA else uniques %in% resp.data
}
## Sort the columns in numeric order and return the output
out[, as.character(sort(as.numeric(colnames(out))))]
5. On the second line, replace the word `data` in backticks, with the Question text of the question that stores your comma-separated data. In the example below, this question is called `data`, but yours could be something like `Q1 - Brands aware of` or similar.
6. Press the blue Play above the code and wait for the code to run.
7. Click Add R Variable
8. Change the Question Type to Pick Any
9. In the Value Attributes dialogue, click OK.
10. On the Variables and Questions tab, label the variables with the labels corresponding to the code values in your data.
The data will now be split into multiple columns, and you can work with it in the way you would any other question in Q.
Troubleshooting - Codes Missing from the Data
If a response was not selected by a respondent (in the above example no-one selected option 10), then a blank dummy variable can be added in:
1. Go to Create > Variables and Questions > Variables > R Variable.
2. In the R CODE field, paste in the code below, and as before remember to change the name of the original source question (`data` in this example).
## Initialize an empty vector of the same length as the number of cases in the file.
vector(, length(`data`))
3. Give it a suitable Variable Base Name and Question Name.
4. Press the blue Play above the code and wait for the code to run.
5. Click Add R Variable
6. On the Variables and Questions tab, select this variable along with the other variables in the existing Pick Any question created earlier.
7. Right-click and select Set Question.
8. Ensure the Question Type is set to Pick Any then click OK.
9. In the Value Attributes dialogue, click OK.
A blank variable will now be added into your question to represent what wasn't selected by the respondents.