String splitting is the process of breaking up a text string in a systematic way so that the individual parts of the text can be processed. This article describes how to go from a single variable where commas separate the 1st, 2nd, and 3rd mentions, and so on ...
To storing each mention in a separate variable:
Requirements
- A data set loaded into Q that contains a text variable where multiple responses are stored in a single variable as comma-separated text values.
Method
To do so, follow these steps:
- From the toolbar, select Create > Variables and Questions > Variable(s) > R Variable or right clicking in the Variables and Questions tab and choosing Insert Variable(s) > R Variable.
-
In the Edit R Variable window, give your new variable a name in the Question Name field at the bottom of the window and build your R code in the R CODE section.
- Next, copy and paste the code below into the Expression box:
x <- strsplit(awareness, ",")
# Get max length
n = max(sapply(x, length))
for (j in 1:length(x))
length(x [[j]]) <- n
z = do.call(rbind, x)
z[is.na(z)] <- "" # Replace NAs with blanks
colnames(z) = paste0("Mention: ",1:ncol(z))
z[,1] # Show only first column of results - Click Press the Play button to verify your output.
- Once verified, click the Add R variable button to complete the process.
This code does the following things:
- Uses the strsplit() function to split the text. This function splits the elements of a text (character) vector
x
into substrings according to the matches to substringsplit
within them, in this example, a comma. - Resets the length of each vector so they are all equal. This is done so that the data can be coerced into a matrix.
- Uses call() as a convenient way to rbind() (combine as rows) all of the split elements.
- Ensures any NA values introduced are converted to blank strings.
- Extracts the first column of the tabulated data.
Variables for 2nd brand mentioned, 3rd mention, and so on, could be added by repeating the process above and modifying the last line of code to refer to columns 2, 3, etc of the table of split elements.
OPTIONAL: If your goal is not to add variables into the data set, you can instead use Create > R Output). This is also a good option to use when prototyping your code before you create new variables.
See Also
How R Works Differently in Q Compared to Other Programs
How to Use Different Types of Data in R
How to Reference Different Items in Your Project in R
How to Work with Conditional R Formulas
How to Add a Custom R Output to your Report
How to Create a Custom R Variable