Introduction
This article describes how to use common mathematical functions via R code. Do know, that in addition to using R, there are also Ready-Made Formula(s) available through the context menu on the Variables and Questions tab - no coding required :)
Requirements
Mathematical formulas can be used within an R variable, R output or data set. Questions, tables and data sets are all effectively tabular data, so some functions may need to be used inside apply() to be performed on each row/columns.
Method - Mathematical operators
All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables and tables:
q2a_1 / (q2a_1 + q2b_1)
All the standard mathematical functions are also available in R. For example, to create the average of a set of variables, we can use the following:
rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))
Note, instead of grouping the variables using cbind, we can also reference the Question Label (enclosed in backticks):
rowMeans(`Q2 - No. of colas consumed`)
Vector arithmetic
One of the great strengths of using R is that you can use vector arithmetic. Consider the expression q2a_1 / sum(q2a_1)
. This tells R to divide the value of q2_a1
by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the variable. Other programs, such as SPSS, would instead treat this expression as meaning to divide q2_a1
by itself. We can similarly standardize q2a_1
to have a mean of 0 and a standard deviation of 1 using (q2a_1 - mean(q2a_1))
/ sd(q2a_1)
.
In these two examples, there are also pre-constructed specialist functions we can use: q2a_1
/ sum(q2a_1)
is equivalent to writing prop.table(q2a_1)
, and (q2a_1 - mean(q2a_1))
/ sd(q2a_1)
is equivalent to scale(q2a_1)
.
Note, most in-built R functions, such as sd
, mean
, sum
, rowMeans
, and rowSums
, will return missing values if any of the values in the vector (variable in this case) passed to them contains a missing value. In most cases, the trick is to use na.rm = TRUE
. For example:
(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)
Sadly, there is no shortage of exotic exceptions to this rule. For example, prop.table
cannot deal with missing values, and scale
automatically removes them.
Questions as tables
If you hover over a variable/question R reference (e.g. `Q2 - No. of colas consumed`) in an R Output with your mouse, you can see that it previews the raw data in tabular format.
This example contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. Just like a table, this preview will include NET or SUM rows/columns depending on the data format. This means by default, they will be included in your formula unless you right-click them in the corresponding table and then select Hide, or else add this exclusion in your code.
If we were, for example, to look at the sum of the variables pertaining to each occasion: Sum, 'out and about' and Sum, 'at home', this would be the last column in the preview: [,"SUM","SUM"]
.
These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:
(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]
This is perhaps more obvious when we review the data as an aggregated table to see the interlocked SUM - SUM cell:
Method - verbs functions
In order to help users along, the Data Science team at Q created improved versions of common mathematical functions in R that incorporate a great deal of data handling. This includes automatically ignoring missing data and being able to run functions like Sum for EachRow or EachColumn on inputs given to it. These functions can be called by name in your R code directly. You can use them on tables as well as variables. To access, add a line to load the verbs library:
library(verbs)
Then begin by typing a function name and a search will appear below your code where you can select the appropriate function:
Method - Base R functions
The following are built-in R functions that will apply to all rows or columns. We have used the na.rm=T argument here to deal with missing values. If this argument is not used and the function encounters missing values, it will simply return NA, which is why including this is a good habit. Do note that using the verbs functions such as SumEachRow and Average as described above automatically ignore missing data - so you don't have to remember to do so!
1. Sum each row over the columns:
rowSums(table.Brand.Image, na.rm=T)
2. Sum each column over the rows:
colSums(table.Brand.Image, na.rm=T)
3. Average each row over the columns:
rowMeans(table.Brand.Image, na.rm=T)
4. Average each column over the rows:
colMeans(table.Brand.Image, na.rm=T)
Method - apply() functions
Some functions can only be run on all rows or columns within an apply or similar function. The format is:
apply(data, rows or columns, function)
The rows or columns argument requires as 1 for rows and 2 for columns.
1. Minimum of each row over the columns:
apply(table.Brand.Image, 1, min)
2. Minimum of each column over the rows:
apply(table.Brand.Image, 2, min)
3. Maximum of each row over the columns:
apply(table.Brand.Image, 1, max)
4. Maximum of each column over the rows:
apply(table.Brand.Image, 2, max)
Similar to the base functions above, you can use the functions above to do the same operation over the raw data across variables. If you use Pick Any questions as the input for example 3 above, you will create a new NET variable as you are essentially taking the max value (1 or 0) across those variables for each respondent. This is akin to seeing if a respondent selected Any Of the variables.
Method - Custom apply() functions
The final parameter of the apply function allows for custom functions. The format is:
function(x) {x do something}
The curly brackets are optional for basic examples but become necessary when you have more elaborate multi-line functions.
1. Any rows over 50:
apply(table.Brand.Image, 1, function(x) any(x>50))
2. All rows over 20:
apply(table.Brand.Image, 1, function(x) all(x>20))
See Also
How to Standardize or Calculate Data within Subgroups in R
How to Create a Custom Index Score using R
How to Work with Conditional R Formulas
How to Filter Raw Data Using R