The function system.time
returns the computing time of R commands.
system.time({x = qt(runif(1e4), df = 0.5)}) # Compute quantiles of the t dist'n
In the above example, the variable x
is a vector of 1e4
elements. When calling the function qt
, some space in the computer memory is pre-allocated to hold these 1e4
elements. This takes time as can be seen in the example below.
n = 3e4
u = runif(n)
system.time({
x1 = numeric(n) # Here we pre-allocate memory
for (i in 1:n) x1[i] = qt(u[i], df = 0.5)
})
system.time({
x2 = c() # x2 is initially empty
for (i in 1:n) x2 = c(x2, qt(u[i], df = 0.5)) # Join x2 with the most recent calculation
})
n = 5e4
u = runif(n)
qf = qp = numeric(n) # Placeholder for storing the output of the commands below
## Loop version
system.time({
for (i in 1:n) {
qf[i] = qt(u[i], df = 0.5)
}
})
## vectorised version
system.time({
qp = qt(u, df = 0.5)
})
The reason for the slower code in the loop is the following. Every line of code must first be translated into native computer language. In the parallel version, the translation only occurs once. In the the loopy version it occurs n
times. This costs time.
apply
and lapply
¶One benefit of for
loops is that the code is sometimes easier to read. In these cases it is not obvious how to vectorise the code. Consider for example the following:
# Code to compute the average length for each variable and species using a loop.
data(iris3) # An array of dimension 50 by 4 by 3
avglen_f = matrix(0, 4, 3)
for (i in 1:4) {
for (j in 1:3) {
avglen_f[i, j] = mean(iris3[, i, j])
}
}
The above code can be vectorised (and simplified) with the aid of the function apply
as follows
### Code to compute the average length for each variable and species using apply.
avglen_v = apply(iris3, c(2, 3), mean)
In words the above code says "apply the function mean
cycling through the dimensions 2 and 3 of the array iris3
." The function mean
will be called 6 times without using R loops. The input to the function each time will be the vector iris3[, 1, 1]
, iris3[, 1, 2]
, etc. Note that the function to be applied need not have a name, e.g., here is the same result using an anonymous function
### Code to compute the average length for each variable and species by applying an anonymous function.
avglen_p = apply(iris3, c(2, 3), function(x) {sum(x)/length(x)})
In the above x
denotes the vector input to the anonymous function. The anonymous function is quite flexible in the sense that the variables used in the body of the function can come from the current environment. In the following example the variable p
is not inputted to the function but its value is taken from the current environment.
n = 15
k = 4
x = matrix(rnorm(n*k), n, k) # A matrix of k different standard normal samples of size n
p = c(.10, .50, .90) # A vector of probabilities
z = apply(x, 2, function(xi) quantile(xi, probs = p)) # Compute the quantiles for each sample. Note that p is not an argument to the function.
The function apply
operates on arrays, i.e., objects whose dimension is not NULL
and returns an array or a vector. The function lapply
(and derivatives) operate on lists, or objects which can be coerced to lists. The loop cycles through each component of the list.
### Return the type of each variable in the data frame
lapply(ToothGrowth, class)
n = 15
k = 4
x = matrix(rnorm(n*k), n, k) # A matrix of k different standard normal samples of size n
p = c(.10, .50, .90) # A vector of probabilities
apply(x, 2, function(xi) quantile(xi, probs = p))
as.list(1:k)
lapply(1:k, function(i) quantile(x[, i], probs = p))
sapply(1:k, function(i) quantile(x[, i], probs = p)) # a neater version of lapply that displays the output as a single array