Solutions¶

1

Use the ToothGrowth data. On the same plotting window create two separage graphs side by side.

On the first one plot length against dose as above but using red squares as plotting symbols instead of black circles. Label your x and y axes "Dose" and "Length" respectively. Add also a title to the plot.
On the second graph plot length against supplement as a boxplot (boxplot is the default for these data types). Label the x and y axes and the graph accordingly.

In [5]:

par(mfrow = c(1, 2))
plot(dose, len, type = "p", pch = 0, col = "red", xlab = "Dose", ylab = "Length", main = "Length vs Dose")
plot(supp, len, xlab = "Supplement", ylab = "Length", main = "Length vs Supplement")

par(mfrow = c(1, 2))
plot(dose, len, type = "p", pch = 0, col = "red", xlab = "Dose", ylab = "Length", main = "Length vs Dose")
plot(supp, len, xlab = "Supplement", ylab = "Length", main = "Length vs Supplement")

par(mfrow = c(1, 2))
plot(dose, len, type = "p", pch = 0, col = "red", xlab = "Dose", ylab = "Length", main = "Length vs Dose")
plot(supp, len, xlab = "Supplement", ylab = "Length", main = "Length vs Supplement")

2

Recreate the plot of length against dose, incorporating the following changes:

Change the axis labels to "Dose" and "Length".
Use different colours and plotting symbols for the two different supplements. Hint. Use dose[supp == "OJ"] to select the elements of dose which correspond to supp = OJ.
Draw the regression line of length against dose. The coefficients of the regression line $\alpha + \beta x$ can be obtained using the following formulae: $\hat{\beta} = cor(x,y) \times sd(y)/sd(x)$, $\hat{\alpha} = \bar{y} - \hat{\beta}\bar{x}$.
Add a legend to your plot explaining each component of the graph.

In [6]:

plot(dose, len, type = "n", xlab = "Dose", ylab = "Length") # Draw the axes only
points(dose[supp == "OJ"], len[supp == "OJ"], col = 2, pch = 0)
points(dose[supp == "VC"], len[supp == "VC"], col = 3, pch = 5)
bhat = cor(dose,len)*sd(len)/sd(dose)
ahat = mean(len) - bhat*mean(dose)
abline(ahat, bhat)
legend("bottomright", legend = c("OJ", "VC", "LS fit"), pch = c(0, 5, NA),
       lty = c(NA, NA, 1), col = c(2, 3, 1))

Plotting on the 2D plane¶

A number of functions are used for drawing on the 2D plane. These have the general call fcn(x, y, z, opt1 = val1, opt2 = val2, ...) where x and y are numeric vectors of coordinates on the 2D plane with elements in increasing order and z is a matrix with as many rows as the elements of x and as many columns as the elements of y. The [i, j] element of z corresponds to z[i, j] = z(x[i], y[j]).

These functions are

image(x, y, z, ...) and contour(x, y, z, ...) Creates an image and contour plot respectively. The optional argument add, if set to TRUE, will not overwrite an existing plot.
persp(x, y, z, ...) Draws a surface over the 2D plane.

In [7]:

## Plot of the bivariate standard normal distribution density
dbsnorm = function(x, y, rho) {
  nx = length(x)
  ny = length(y)
  xmat = matrix(x, nx, ny)
  ymat = t(matrix(y, ny, nx)) # t() gives the transpose of a matrix
  (1/(2*pi*sqrt(1-rho^2))) * exp(- .5 * (xmat^2 + ymat^2 - 2*rho*xmat*ymat))
}
sval = seq(-4, 4, length.out = 101) # Range of standard normal values
xval = sval
yval = sval
zval = dbsnorm(sval, sval, rho = 0.8)
persp(xval, yval, zval, theta = 30, col = "yellow")

plot(dose, len, type = "n", xlab = "Dose", ylab = "Length") # Draw the axes only
points(dose[supp == "OJ"], len[supp == "OJ"], col = 2, pch = 0)
points(dose[supp == "VC"], len[supp == "VC"], col = 3, pch = 5)
bhat = cor(dose,len)*sd(len)/sd(dose)
ahat = mean(len) - bhat*mean(dose)
abline(ahat, bhat)
legend("bottomright", legend = c("OJ", "VC", "LS fit"), pch = c(0, 5, NA),
       lty = c(NA, NA, 1), col = c(2, 3, 1))

plot(dose, len, type = "n", xlab = "Dose", ylab = "Length") # Draw the axes only
points(dose[supp == "OJ"], len[supp == "OJ"], col = 2, pch = 0)
points(dose[supp == "VC"], len[supp == "VC"], col = 3, pch = 5)
bhat = cor(dose,len)*sd(len)/sd(dose)
ahat = mean(len) - bhat*mean(dose)
abline(ahat, bhat)
legend("bottomright", legend = c("OJ", "VC", "LS fit"), pch = c(0, 5, NA),
       lty = c(NA, NA, 1), col = c(2, 3, 1))

Plotting on the 2D plane¶

A number of functions are used for drawing on the 2D plane. These have the general call fcn(x, y, z, opt1 = val1, opt2 = val2, ...) where x and y are numeric vectors of coordinates on the 2D plane with elements in increasing order and z is a matrix with as many rows as the elements of x and as many columns as the elements of y. The [i, j] element of z corresponds to z[i, j] = z(x[i], y[j]).

These functions are

image(x, y, z, ...) and contour(x, y, z, ...) Creates an image and contour plot respectively. The optional argument add, if set to TRUE, will not overwrite an existing plot.
persp(x, y, z, ...) Draws a surface over the 2D plane.

## Plot of the bivariate standard normal distribution density
dbsnorm = function(x, y, rho) {
  nx = length(x)
  ny = length(y)
  xmat = matrix(x, nx, ny)
  ymat = t(matrix(y, ny, nx)) # t() gives the transpose of a matrix
  (1/(2*pi*sqrt(1-rho^2))) * exp(- .5 * (xmat^2 + ymat^2 - 2*rho*xmat*ymat))
}
sval = seq(-4, 4, length.out = 101) # Range of standard normal values
xval = sval
yval = sval
zval = dbsnorm(sval, sval, rho = 0.8)
persp(xval, yval, zval, theta = 30, col = "yellow")

## Plot of the bivariate standard normal distribution density
dbsnorm = function(x, y, rho) {
  nx = length(x)
  ny = length(y)
  xmat = matrix(x, nx, ny)
  ymat = t(matrix(y, ny, nx)) # t() gives the transpose of a matrix
  (1/(2*pi*sqrt(1-rho^2))) * exp(- .5 * (xmat^2 + ymat^2 - 2*rho*xmat*ymat))
}
sval = seq(-4, 4, length.out = 101) # Range of standard normal values
xval = sval
yval = sval
zval = dbsnorm(sval, sval, rho = 0.8)
persp(xval, yval, zval, theta = 30, col = "yellow")

3

Consider the data generated using the following R commands {r, eval = FALSE} n <- 500 x <- rnorm(n, mean = 1, sd = 2) y <- 2*x + rnorm(n, mean = 0, sd = 3)

Draw a scatterplot of y against x.
Overlay the scatterplot with a contour plot of the true joint distribution of x and y in red colour.

Hint. If $(Z_1, Z_2)$ are bivariate standard normally distributed with correlation $\rho$, then $(X_1, X_2)$ where $X_1 = \mu_1 + \sigma_1 Z_1$ and $X_2 = \mu_2 + \sigma_2 Z_2$ are bivariate normally distributed with correlation $\rho$, mean $(\mu_1, \mu_2)$ and standard deviations $\sigma_1$ and $\sigma_2$. Then the density of $(X_1, X_2)$ is $$\frac{1}{\sigma_1 \sigma_2} f_{(Z_1,Z_2)}\biggl( \frac{x_1 - \mu_1}{\sigma_1}, \frac{x_2 - \mu_2}{\sigma_2} ; \rho \biggr)$$ where $f_{(Z_1,Z_2)}(z_1,z_2;\rho)$ is the density of $(Z_1,Z_2)$.

In [8]:

n = 500
x = rnorm(n, mean = 1, sd = 2)
y = 2*x + rnorm(n, mean = 0, sd = 3)
plot(x, y, xlim = c(-7, 9), ylim = c(-18, 22))
sval = seq(-4, 4, length.out = 101) # Range of standard normal values
xval = 1 + 2*sval
yval = 2 + 5*sval
zval = dbsnorm(sval, sval, rho = 8/(2*5))/(2*5)
contour(xval, yval, zval, add = TRUE, col = "red")

n = 500
x = rnorm(n, mean = 1, sd = 2)
y = 2*x + rnorm(n, mean = 0, sd = 3)
plot(x, y, xlim = c(-7, 9), ylim = c(-18, 22))
sval = seq(-4, 4, length.out = 101) # Range of standard normal values
xval = 1 + 2*sval
yval = 2 + 5*sval
zval = dbsnorm(sval, sval, rho = 8/(2*5))/(2*5)
contour(xval, yval, zval, add = TRUE, col = "red")

n = 500
x = rnorm(n, mean = 1, sd = 2)
y = 2*x + rnorm(n, mean = 0, sd = 3)
plot(x, y, xlim = c(-7, 9), ylim = c(-18, 22))
sval = seq(-4, 4, length.out = 101) # Range of standard normal values
xval = 1 + 2*sval
yval = 2 + 5*sval
zval = dbsnorm(sval, sval, rho = 8/(2*5))/(2*5)
contour(xval, yval, zval, add = TRUE, col = "red")