Code Optimisation¶

Timing R commands¶

The function system.time returns the computing time of R commands.

system.time({x = qt(runif(1e4), df = 0.5)}) # Compute quantiles of the t dist'n

   user  system elapsed 
   0.14    0.00    0.14

Array pre-allocation¶

In the above example, the variable x is a vector of 1e4 elements. When calling the function qt, some space in the computer memory is pre-allocated to hold these 1e4 elements. This takes time as can be seen in the example below.

n = 3e4
u = runif(n)

Pre-allocate memory¶

system.time({
  x1 = numeric(n) # Here we pre-allocate memory
  for (i in 1:n) x1[i] = qt(u[i], df = 0.5)
})

   user  system elapsed 
   0.49    0.01    0.50

Without memory pre-allocation¶

system.time({
  x2 = c() # x2 is initially empty
  for (i in 1:n) x2 = c(x2, qt(u[i], df = 0.5)) # Join x2 with the most recent calculation
})

   user  system elapsed 
   1.73    0.00    1.73

In the case for x2, the memory is allocated within the for loop so there are n such operations. This slowed down the code.

Vectorising code¶

The `for` loop¶

Preferably loops should be avoided when possible. Consider the following example.

n = 5e4
u = runif(n)
qf = qp = numeric(n) # Placeholder for storing the output of the commands below

## Loop version
system.time({
  for (i in 1:n) {
    qf[i] = qt(u[i], df = 0.5)
  }
})

## vectorised version
system.time({
  qp = qt(u, df = 0.5)
})

   user  system elapsed 
   0.81    0.00    0.81

   user  system elapsed 
    0.7     0.0     0.7

The reason for the slower code in the loop is the following. Every line of code must first be translated into native computer language. In the parallel version, the translation only occurs once. In the the loopy version it occurs n times. This costs time.

Using `apply` and `lapply`¶

One benefit of for loops is that the code is sometimes easier to read. In these cases it is not obvious how to vectorise the code. Consider for example the following:

# Code to compute the average length for each variable and species using a loop.

data(iris3) # An array of dimension 50 by 4 by 3
avglen_f = matrix(0, 4, 3)
for (i in 1:4) {
  for (j in 1:3) {
    avglen_f[i, j] = mean(iris3[, i, j])
  }
}

The above code can be vectorised (and simplified) with the aid of the function apply as follows

### Code to compute the average length for each variable and species using apply.
avglen_v = apply(iris3, c(2, 3), mean)

In words the above code says "apply the function mean cycling through the dimensions 2 and 3 of the array iris3." The function mean will be called 6 times without using R loops. The input to the function each time will be the vector iris3[, 1, 1], iris3[, 1, 2], etc. Note that the function to be applied need not have a name, e.g., here is the same result using an anonymous function

### Code to compute the average length for each variable and species by applying an anonymous function.
avglen_p = apply(iris3, c(2, 3), function(x) {sum(x)/length(x)})

In the above x denotes the vector input to the anonymous function. The anonymous function is quite flexible in the sense that the variables used in the body of the function can come from the current environment. In the following example the variable p is not inputted to the function but its value is taken from the current environment.

n = 15
k = 4
x = matrix(rnorm(n*k), n, k) # A matrix of k different standard normal samples of size n
p = c(.10, .50, .90) # A vector of probabilities
z = apply(x, 2, function(xi) quantile(xi, probs = p)) # Compute the quantiles for each sample. Note that p is not an argument to the function.

The function apply operates on arrays, i.e., objects whose dimension is not NULL and returns an array or a vector. The function lapply (and derivatives) operate on lists, or objects which can be coerced to lists. The loop cycles through each component of the list.

### Return the type of each variable in the data frame
lapply(ToothGrowth, class)
n = 15
k = 4
x = matrix(rnorm(n*k), n, k) # A matrix of k different standard normal samples of size n
p = c(.10, .50, .90) # A vector of probabilities
apply(x, 2, function(xi) quantile(xi, probs = p))
as.list(1:k)
lapply(1:k, function(i) quantile(x[, i], probs = p))
sapply(1:k, function(i) quantile(x[, i], probs = p)) # a neater version of lapply that displays the output as a single array

1

Consider the ToothGrowth data.

Compute using apply, lapply etc the average growth (variable len) for the two different supplements (variable supp).
Compute using apply, lapply etc the average growth (variable len) for each combination of supplement (variable supp) and dose (variable dose).

Solution

2

Consider the topo data provided by the package MASS containing the altitude z at 52 coordinates x and y.

Use for loops efficiently to compute the matrix of pairwise distances between the locations given by the x, y coordinates. Compute only the lower triangular part of the matrix.
The spherical correlation structure is given by the function $$r(d; \theta) = \begin{cases} 1 - 1.5 \dfrac{d}{\theta} + 0.5 \biggl( \dfrac{d}{\theta} \biggr)^3 & \text{if $d < \theta$,} \\ 0 & \text{otherwise.} \end{cases}$$ where $d$ is the distance between two coordinates and $\theta$ is a parameter that controls the strength of the correlation. Write efficient parallel code to compute the correlation matrix for the sampling coordinates of the topo data set at $\theta = 3$.

Solution

10%	-0.8102628	-0.5512272	-1.7612170	-1.277195
50%	0.4979250	0.1577357	0.1119588	-0.369030
90%	1.4027805	0.8512106	1.0646683	1.936443

10%	-0.8102628	-0.5512272	-1.7612170	-1.277195
50%	0.4979250	0.1577357	0.1119588	-0.369030
90%	1.4027805	0.8512106	1.0646683	1.936443

Code Optimisation¶

Timing R commands¶

Array pre-allocation¶

Pre-allocate memory¶

Without memory pre-allocation¶

Vectorising code¶

The `for` loop¶

Using `apply` and `lapply`¶

1

2

Further reading

Previous

Schedule

Next

Code Optimisation¶

Timing R commands¶

Array pre-allocation¶

Pre-allocate memory¶

Without memory pre-allocation¶

Vectorising code¶

The for loop¶

Using apply and lapply¶

1

2

Further reading

Previous

Schedule

Next

The `for` loop¶

Using `apply` and `lapply`¶