This article describes how to use R-based regular expressions to perform functions such as filtering rows and columns in a table and replacing text. A nice reference for creating regular expressions in R can be found on R Studio's website here.
- An R variable, calculation or data set.
- Familiarity with How to Work with Data in R.
A useful function for identifying rows or columns to keep or exclude is grepl(). This function returns true if the specified text appears.
In the below example, we have a numeric table:
The basic format is
grepl(text, data). You can additionally add a boolean ignore.case argument to account for inconsistent case.
Below we will remove any row name that includes "SUM" and then keep any row where "out and about" appears:
x = No.of.colas.consumed
# remove SUMS
x = x[!grepl("SUM",rownames(x),ignore.case=T)]
# keep only out and about
tab = x[grepl("out and about",rownames(x),ignore.case=T)]
If you have more than one term to filter by, for example, rows that include "out and about" or "Coke", you can update the condition as follows:
tab = x[grepl("out and about|Coke",rownames(x),ignore.case=T)]
A useful function for replacing text is gsub(). This is similar to the sub() function but replaces all instances of the specified text.
In the below example, we have a text output:
The basic format is
gsub(find_text, replace_text, data). You can additionally add a boolean ignore.case argument to account for inconsistent case.
Below we will replace all instances of "Pepsi" with "PEPSI":
x = Main.difference.between.cola.drinkers
tab = gsub("Pepsi","PEPSI",x,ignore.case=T)
If we look at the third row more closely, we can replace both "////" and "//" with a space:
tab = gsub("////|//"," ",x)
If you wish to replace special characters like ".", you will need to place "\\" in front of them:
tab = gsub("\\.",";",x)
If you wish to remove everything after "////", you would do the below:
tab = gsub("////.*","",x)
And if you wish to remove everything before "////", you would instead do the following:
tab = gsub(".*////","",x)