## Introduction

This article describes how to use common mathematical functions via R code. Do know, that in addition to using R, there are also **Ready-Made Formula(s)** available through the context menu on the Variables and Questions tab - no coding required :)

## Requirements

Mathematical formulas can be used within an R variable, R output or data set. Questions, tables and data sets are all effectively tabular data, so some functions may need to be used inside apply() to be performed on each row/columns.

## Method - Mathematical operators

All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables and tables:

q2a_1 / (q2a_1 + q2b_1)

All the standard mathematical functions are also available in R. For example, to create the average of a set of variables, we can use the following:

rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))

Note, instead of grouping the variables using *cbind*, we can also reference the **Question** **Label **(enclosed in backticks):

rowMeans(`Q2 - No. of colas consumed`)

### Vector arithmetic

One of the great strengths of using R is that you can use *vector arithmetic*. Consider the expression `q2a_1 / sum(q2a_1)`

. This tells R to divide the value of `q2_a1`

by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the variable. Other programs, such as SPSS, would instead treat this expression as meaning to divide `q2_a1`

by itself. We can similarly standardize `q2a_1`

to have a mean of 0 and a standard deviation of 1 using ```
(q2a_1 - mean(q2a_1))
/ sd(q2a_1)
```

.

In these two examples, there are also pre-constructed specialist functions we can use: ```
q2a_1
/ sum(q2a_1)
```

is equivalent to writing `prop.table(q2a_1)`

, and ```
(q2a_1 - mean(q2a_1))
/ sd(q2a_1)
```

is equivalent to `scale(q2a_1)`

.

Note, most in-built R functions, such as `sd`

, `mean`

, `sum`

, `rowMeans`

, and `rowSums`

, will return missing values if any of the values in the *vector *(variable in this case) passed to them contains a missing value. In most cases, the trick is to use `na.rm = TRUE`

. For example:

(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)

Sadly, there is no shortage of exotic exceptions to this rule. For example, `prop.table`

cannot deal with missing values, and `scale`

automatically removes them.

### Questions as tables

If you hover over a variable/question R reference (e.g. `Q2 - No. of colas consumed`) in an **R Output **with your mouse, you can see that it previews the raw data in tabular format.

This example contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. Just like a table, this preview will include NET or SUM rows/columns depending on the data format. This means by default, they will be included in your formula unless you right-click them in the corresponding table and then select** Hide**, or else add this exclusion in your code.

If we were, for example, to look at the sum of the variables pertaining to each occasion: **Sum, 'out and about'** and **Sum, 'at home'**, this would be the last column in the preview: `[,"SUM","SUM"]`

.

These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:

(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]

This is perhaps more obvious when we review the data as an aggregated table to see the interlocked *SUM - SUM* cell:

## Method - *verbs* functions

In order to help users along, the Data Science team at Q created improved versions of common mathematical functions in R that incorporate a great deal of data handling. This includes automatically ignoring missing data and being able to run functions like Sum for EachRow or EachColumn on inputs given to it. These functions can be called by name in your R code directly. You can use them on tables as well as variables. To access, add a line to load the *verbs* library:

library(verbs)

Then begin by typing a function name and a search will appear below your code where you can select the appropriate function:

## Method - Base R functions

The following are built-in R functions that will apply to all rows or columns. We have used the *na.rm=T* argument here to deal with missing values. If this argument is not used and the function encounters missing values, it will simply return NA, which is why including this is a good habit. Do note that using the *verbs* functions such as SumEachRow and Average as described above automatically ignore missing data - so you don't have to remember to do so!

### 1. Sum each row over the columns:

rowSums(table.Brand.Image, na.rm=T)

### 2. Sum each column over the rows:

colSums(table.Brand.Image, na.rm=T)

### 3. Average each row over the columns:

rowMeans(table.Brand.Image, na.rm=T)

### 4. Average each column over the rows:

colMeans(table.Brand.Image, na.rm=T)

## Method - *apply()* functions

Some functions can only be run on all rows or columns within an *apply* or similar function. The format is:

apply(data, rows or columns, function)

The *rows or columns* argument requires as **1** for **rows** and **2** for** columns**.

### 1. Minimum of each row over the columns:

apply(table.Brand.Image, 1, min)

### 2. Minimum of each column over the rows:

apply(table.Brand.Image, 2, min)

### 3. Maximum of each row over the columns:

apply(table.Brand.Image, 1, max)

### 4. Maximum of each column over the rows:

apply(table.Brand.Image, 2, max)

Similar to the base functions above, you can use the functions above to do the same operation over the raw data across variables. If you use Pick Any questions as the input for example 3 above, you will create a new NET variable as you are essentially taking the max value (1 or 0) across those variables for each respondent. This is akin to seeing if a respondent selected *Any Of* the variables.

## Method - Custom *apply()* functions

The final parameter of the* apply* function allows for custom functions. The format is:

function(x) {xdo something}

The curly brackets are optional for basic examples but become necessary when you have more elaborate multi-line functions.

### 1. Any rows over 50:

apply(table.Brand.Image, 1, function(x) any(x>50))

### 2. All rows over 20:

apply(table.Brand.Image, 1, function(x) all(x>20))

## See Also

How to Standardize or Calculate Data within Subgroups in R

How to Create a Custom Index Score using R

How to Work with Conditional R Formulas

How to Filter Raw Data Using R