How to Perform Mathematical Calculations Using R

Introduction

This article describes how to use common mathematical functions via R code. Do know, that in addition to using R, there are also Ready-Made Formula(s) available through the context menu on the Variables and Questions tab - no coding required :)

Requirements

Mathematical formulas can be used within an R variable, R output or data set. Questions, tables and data sets are all effectively tabular data, so some functions may need to be used inside apply() to be performed on each row/columns.

Method - Mathematical operators

All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables and tables:

q2a_1 / (q2a_1 + q2b_1)

All the standard mathematical functions are also available in R. For example, to create the average of a set of variables, we can use the following:

rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))

Note, instead of grouping the variables using cbind, we can also reference the Question Label (enclosed in backticks):

rowMeans(`Q2 - No. of colas consumed`)

Vector arithmetic

One of the great strengths of using R is that you can use vector arithmetic. Consider the expression q2a_1 / sum(q2a_1). This tells R to divide the value of q2_a1 by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the variable. Other programs, such as SPSS, would instead treat this expression as meaning to divide q2_a1 by itself. We can similarly standardize q2a_1 to have a mean of 0 and a standard deviation of 1 using (q2a_1 - mean(q2a_1)) / sd(q2a_1).

In these two examples, there are also pre-constructed specialist functions we can use: q2a_1 / sum(q2a_1) is equivalent to writing prop.table(q2a_1), and (q2a_1 - mean(q2a_1)) / sd(q2a_1) is equivalent to scale(q2a_1).

Note, most in-built R functions, such as sd, mean, sum, rowMeans, and rowSums, will return missing values if any of the values in the vector (variable in this case) passed to them contains a missing value. In most cases, the trick is to use na.rm = TRUE. For example:

(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)

Sadly, there is no shortage of exotic exceptions to this rule. For example, prop.table cannot deal with missing values, and scale automatically removes them.

Questions as tables

If you hover over a variable/question R reference (e.g. `Q2 - No. of colas consumed`) in an R Output with your mouse, you can see that it previews the raw data in tabular format.

This example contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. Just like a table, this preview will include NET or SUM rows/columns depending on the data format. This means by default, they will be included in your formula unless you right-click them in the corresponding table and then select Hide, or else add this exclusion in your code.

If we were, for example, to look at the sum of the variables pertaining to each occasion: Sum, 'out and about' and Sum, 'at home', this would be the last column in the preview: [,"SUM","SUM"].

These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:

(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]

This is perhaps more obvious when we review the data as an aggregated table to see the interlocked SUM - SUM cell:

Method - verbs functions

In order to help users along, the Data Science team at Q created improved versions of common mathematical functions in R that incorporate a great deal of data handling. This includes automatically ignoring missing data and being able to run functions like Sum for EachRow or EachColumn on inputs given to it. These functions can be called by name in your R code directly. You can use them on tables as well as variables. To access, add a line to load the verbs library:

library(verbs)

Then begin by typing a function name and a search will appear below your code where you can select the appropriate function:

Method - Base R functions

The following are built-in R functions that will apply to all rows or columns. We have used the na.rm=T argument here to deal with missing values. If this argument is not used and the function encounters missing values, it will simply return NA, which is why including this is a good habit. Do note that using the verbs functions such as SumEachRow and Average as described above automatically ignore missing data - so you don't have to remember to do so!

1. Sum each row over the columns:

rowSums(table.Brand.Image, na.rm=T)

2. Sum each column over the rows:

colSums(table.Brand.Image, na.rm=T)

3. Average each row over the columns:

rowMeans(table.Brand.Image, na.rm=T)

4. Average each column over the rows:

colMeans(table.Brand.Image, na.rm=T)

Method - apply() functions

Some functions can only be run on all rows or columns within an apply or similar function. The format is:

apply(data, rows or columns, function)

The rows or columns argument requires as 1 for rows and 2 for columns.

1. Minimum of each row over the columns:

apply(table.Brand.Image, 1, min)

2. Minimum of each column over the rows:

apply(table.Brand.Image, 2, min)

3. Maximum of each row over the columns:

apply(table.Brand.Image, 1, max)

4. Maximum of each column over the rows:

apply(table.Brand.Image, 2, max)

Similar to the base functions above, you can use the functions above to do the same operation over the raw data across variables. If you use Pick Any questions as the input for example 3 above, you will create a new NET variable as you are essentially taking the max value (1 or 0) across those variables for each respondent. This is akin to seeing if a respondent selected Any Of the variables.

Method - Custom apply() functions

The final parameter of the apply function allows for custom functions. The format is:

function(x) {x do something}

The curly brackets are optional for basic examples but become necessary when you have more elaborate multi-line functions.

1. Any rows over 50:

apply(table.Brand.Image, 1, function(x) any(x>50))

2. All rows over 20:

apply(table.Brand.Image, 1, function(x) all(x>20))

Articles in this section

Introduction

Requirements

Method - Mathematical operators

Vector arithmetic

Questions as tables

Method - verbs functions

Method - Base R functions

1. Sum each row over the columns:

2. Sum each column over the rows:

3. Average each row over the columns:

4. Average each column over the rows:

Method - apply() functions

1. Minimum of each row over the columns:

2. Minimum of each column over the rows:

3. Maximum of each row over the columns:

4. Maximum of each column over the rows:

Method - Custom apply() functions

1. Any rows over 50:

2. All rows over 20:

See Also

Articles in this section

Introduction

Requirements

Method - Mathematical operators

Vector arithmetic

Questions as tables

Method - verbs functions

Method - Base R functions

1. Sum each row over the columns:

2. Sum each column over the rows:

3. Average each row over the columns:

4. Average each column over the rows:

Method - apply() functions

1. Minimum of each row over the columns:

2. Minimum of each column over the rows:

3. Maximum of each row over the columns:

4. Maximum of each column over the rows:

Method - Custom apply() functions

1. Any rows over 50:

2. All rows over 20:

See Also

Related articles