String splitting is the process of breaking up a text string in a systematic way so that the individual parts of the text can be processed. You may want to do this when you have data in a multi-max format where responses are saved in one variable delimited by a character. The recommended solution to this is to use our List of Items automation to delimit the list and save down the categories, see the last method in How to Do Automatic List Categorization of Text Data with Q. This article describes a second method, which is more manual, but can be more customized if needed. This article will show you how to go from a single variable where commas separate the 1st, 2nd, and 3rd mentions, and so on ...
To storing each mention in a separate variable:
Requirements
- A data set loaded into Q that contains a text variable where multiple responses are stored in a single variable as comma-separated text values.
Method
Using JavaScript
- From the toolbar, select Create > Variables and Questions > Variable(s) > JavaScript Formula
- Choose Text if you want the output to be another text variable, or Numeric if you want the output variable to have numeric values.
- Next, copy and paste the code below into the Expression box:
// Define the text variable to split
var string = awareness
// Define the delimiter
var split_string = string.split(",");
// Extract the first string of before the delimiter (first string field = 0, second string field = 1 etc)
var result = split_string[0]
// Return the result
result - Assign a name and label for the new variable.
- Click OK.
- Right click on the new variable and Copy and Paste Variable(s) > Exact Copy.
- Right click and Edit Variable.
- Change the 8th line to use [1] to pull off the second item in the text:
var result = split_string[1]
- Click OK.
- Repeat steps 6-9 until you've created enough variables for all of the items in the text.
This code does the following things:
The split() method is used to split a string into an array of substrings and it returns a new array.
Tip: If an empty string ("") is used as the separator, the string is split between each character.
Note: The split() method does not change the original string.
The syntax:
OPTIONAL: separator specifies the character, or the regular expression, to use for splitting the string, as in the example above, ",". If omitted, the entire string will be returned (an array with only one item).
OPTIONAL: limit is an integer that specifies the number of splits, items after the split limit will not be included in the array.
Method
Using R
To do so, follow these steps:
- From the toolbar, select Create > Variables and Questions > Variable(s) > R Variable or right clicking in the Variables and Questions tab and choosing Insert Variable(s) > R Variable.
-
In the Edit R Variable window, give your new variable a name in the Question Name field at the bottom of the window and build your R code in the R CODE section.
- Next, copy and paste the code below into the Expression box:
x <- strsplit(awareness, ",")
# Get max length
n = max(sapply(x, length))
for (j in 1:length(x))
length(x [[j]]) <- n
z = do.call(rbind, x)
z[is.na(z)] <- "" # Replace NAs with blanks
colnames(z) = paste0("Mention: ",1:ncol(z))
z[,1] # Show only first column of results - Click Press the Play button to verify your output.
- Once verified, click the Add R variable button to complete the process.
This code does the following things:
- Uses the strsplit() function to split the text. This function splits the elements of a text (character) vector
x
into substrings according to the matches to substringsplit
within them, in this example, a comma. - Resets the length of each vector so they are all equal. This is done so that the data can be coerced into a matrix.
- Uses call() as a convenient way to rbind() (combine as rows) all of the split elements.
- Ensures any NA values introduced are converted to blank strings.
- Extracts the first column of the tabulated data.
Variables for 2nd brand mentioned, 3rd mention, and so on, could be added by repeating the process above and modifying the last line of code to refer to columns 2, 3, etc of the table of split elements.
OPTIONAL: If your goal is not to add variables into the data set, you can instead use Create > R Output). This is also a good option to use when prototyping your code before you create new variables.