Q leverages the power of the open-source statistical language, R, for many of its features, and this includes allowing you to write your own R code in order to customize your analysis and reports. R and its ecosystem are constantly updating. Q needs to keep our R version up-to-date so that everything stays in sync.
A new version of R is about to be released, and R has become a little stricter about some of the things it lets you do. R is being stricter to help people write better code and make fewer mistakes. Before we roll out the latest version of R, we wanted to let our user community know about some of the key changes so that people who write their own R code can make sure that it still plays nicely with the latest version.
In this article, we cover the key changes that most normal R code writers need to be aware of. They come in two different kinds. We've included examples of these changes below:
- Changes that could break your code.
- Changes that could make your outputs look different.
- Other Changes
Note: If you aren't the sort of person who writes their own R code in Q, then you don't need to read this article. We test all of our R-based features thoroughly to make sure they don't break on you!
The new version will be released on Wednesday, 2nd August, 2023. If you have any concerns about the changes, please reach out to email@example.com for guidance.
Part 1 - Changes that can cause code to stop working
As mentioned above, R has gotten a bit stricter to help R users write better code. There are a couple of issues that currently show up as orange warnings in Q. In the new version of R, these will switch to errors. Thus, you should check your code for the following and update it if necessary. The main two changes that we think Q users may encounter in custom code are described in this section.
Comparing multiple TRUE and FALSE values
When you want to compare a single pair of TRUE/FALSE values (these are called logical in R) and get a single value in return, you use double operators:
- For an AND operation, you write x && y to determine whether both x and y are true.
- For an OR operation, you write x || y to determine whether either x or y is true.
A problem enters when x or y contains more than one value. Consider the following:
x = c(TRUE, FALSE)
y = TRUE
x && y
What single value would you expect to get here? Currently, R will compare y with the first value in x and give you back a value of TRUE.
Most programming languages don't work this way because the answer is ambiguous. The second value of x has been ignored. The R developers recognize this as a problem, and so the current version of R provides a warning if you do this, and the upcoming R version 4.3 will instead generate an error.
The key takeaway for Q users is if you see a warning like this in any of your custom R calculations, you should identify where the problem is occurring and update your code so that you only compare a single pair of values:
There is not always an easy recipe for solving this issue because it really depends on the details of the code that generated the TRUE/FALSE values to begin with. If you need help figuring out how to update your code, please contact firstname.lastname@example.org
For a more in-depth discussion of this topic, see: https://www.jumpingrivers.com/blog/whats-new-r43/
Sorting the columns of a data frame
Data frames in R are a special type of table that allows the columns of the table to contain a mix of different types of data. For example, one column can have numbers, while other columns could contain text. As all the entries in a single column have the same data type, it makes sense for you to sort the rows of the data frame according to the values in a column - say, from highest to lowest. But what about wanting to sort the columns of a data frame according to one of its rows? For example, what would it mean to sort the columns of this data frame according to the first row:
In reality, in this case, you would probably never try. But sometimes, it is less obvious when the columns contain different types of data, and it becomes a natural thing to want to do. Currently, R would give you a warning message and would not change the order of the columns. In R version 4.3, R will instead provide an error to prevent you from trying to sort something that cannot unambiguously be sorted, to prevent you from assuming that a result is sorted when in reality, it is not. This is a conservative change, and it applies regardless of the types of data in the columns - sorting the data from a row of a data frame is forbidden.
The key takeaway for Q users is if you see this warning message in any of your custom R calculations, you need to update the part of the calculation that is doing the sorting:
Fortunately, for this problem, there is a general recipe that can be used to sort the columns when the data is all numeric, and it really does make sense to want to sort it. The recipe is as follows:
- Make a new copy of the data frame, which is converted to a matrix.
- Use the order() function to identify the sorted ordering of the numbers in the row of the data frame that you want to use to sort.
- Supply the order to the data frame using the subscription operator, [ , ].
For example, let's say I make a numeric version of my data frame from above:
df.2 = data.frame("Income" = c(45000, 35000, 80000, 67000),
"Age" = c(25,30,60,44),
"Number of Children" = c(0,1,2,2),
check.names = FALSE)
data.matrix = as.matrix(df.2)
order.of.columns = order(data.matrix[1,], decreasing = FALSE)
sorted.df = df.2[, order.of.columns]
Part 2 - Changes that may cause results to look different
There are a handful of changes that can cause outputs created by custom R code to look different. The first of these issues will only affect users who are on the early-release streams of Q, versions 5.16.1 and greater. The other two issues are not common at all and will probably not affect most Q users.
Q's tables now retain their metadata when modified in R
If you create calculations to select parts of a Q table, you will notice that key information, like the name of the statistic, does not show up on the new table. For example, selecting some rows from this table removes the Row % label from the top left corner:
This happens with any operations that involve the subscript operator (or the "square brackets"), including selecting rows and columns or simply using this approach to re-order the rows and columns. It will also happen when transposing a table in R using the t() function.
Such information will now be retained, with the key result being that tables created in this way will now display their statistic (other information is available in the attributes as well). The example above will now more closely match the original table:
If you don't want this information to show up in the table, you can strip it off so that it will not be displayed using code like the following:
my.table = table.Q6.Brand.preference[1:3, ]
attr(my.table, "Statistic") = NULL
Small numerical changes
R version 4.3 has updated some of the libraries that it leverages to do mathematical calculations (linear algebra libraries). Changes of this kind are typically found at the margins, by which I mean that the developers have found a way to do something faster or with less memory rather than changing something major, like 1 + 1 = 3. Whenever an update like this happens, it inevitably comes with small changes to the results of calculations for things like regression and other statistical calculations. Usually, the changes are in tiny decimal places and don't even appear unless you look. We thoroughly test all of our statistical tools and have not identified any major changes to results, although we have seen a few small changes in the margins that do not change the meaning of the results.
Outputs from R's linear models
R has several functions for regression models built in. If you are writing your own model code in Q using glm(), then you may notice some small changes to the content of the outputs from these models.
Firstly, if you are printing a summary of a model created using glm(), then the section called Deviance Residuals is no longer shown:
If you do want to see this information, then you can explicitly ask for it when printing the glm:
print(my.glm, show.residuals = TRUE)
Secondly, the plot function for glm() results now produces a half-normal Q-Q plot rather than a normal Q-Q plot.
There are a host of other changes in R 4.3 which will not affect any Q users. Many of them relate to installing and building packages, which Q takes care of in the background. There are also a large number of changes that are quite niche and involve adding new functionality rather than changing existing functionality. If you are an avid R user and keen to stay up to date with all the changes, please see: https://cran.r-project.org/doc/manuals/r-release/NEWS.pdf.