Stat 470/670 Lecture 12: More interactions, coplots, and modeling

Julia Fukuyama

Today: building and checking models with trivariate data

Modifying LOESS for two predictor variables

Now we have a response variable \(y\), a predictor variable \(x = (u, v)\), and \(n\) samples.

The parameters are still:

Suppose \(\lambda = 1\)

To find the value of the LOESS smoother at a point \(x_0 = (u_0, v_0)\), we solve for the coefficients in the weighted regression problem \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2) = \text{argmin}_{ \beta_0, \beta_1, \beta_2} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i ))^2, \]

The value of the LOESS smoother at \(x_0\) is then \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0\).

If \(\lambda = 2\), to find the value of the LOESS smoother at a point \(x_0 = (u_0, v_0)\), we solve for the coefficients in the weighted regression problem \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3, \hat \beta_4, \hat \beta_5) = \text{min}_{ \beta_0, \beta_1, \beta_2, \beta_3, \beta_4, \beta_5} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 u_i^2 + \beta_5 v_i^2))^2, \]

The value of the LOESS smoother at \(x_0\) is then \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_i + \hat \beta_3 u_0 v_0 + \hat \beta_4 u_0^2 + \hat \beta_5 v_0^2\).

Weights for LOESS with two predictors

The weights are: \[ w_i(x_0) = T(\Delta_i(x_0) / \Delta_{(q)}(x_0)) \] with

Since the two predictor variables are potentially on different scales, they are usually normalized using a robust estimate of the spread before the distances are computed. Some options

Cleveland suggests using a 10% trimmed standard deviation as the measure of spread for normalization.

A useful modification of LOESS

What if we think some of the conditional relations are from a parametric family? For example, the dependence of NOx on C seems to always be linear, no matter what value of E we look at.

We can modify LOESS so that it fits some of the conditional relations globally instead of locally.

Let \(\hat g(u,v)\) be our fitted LOESS surface, and suppose we want \(\hat g(u, v)\), seen as a function of \(u\), to be from a parametric family (e.g. linear or quadratic).

To do this, we simply drop the \(u_i\)’s from our distances when computing the weights.

Suppose we want to modify locally linear LOESS in this way. To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3\) as \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3) = \text{argmin}_{\beta_0, \beta_1, \beta_2, \beta_3} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.

The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0\).

This leads to a fit that is locally linear in \(v\) and globally linear in \(u\), with different slopes in \(u\) conditional on different values of \(v\).

To check your understanding: how would the fit change if you didn’t include the \(u_iv_i\) term?

Locally quadratic fit in \(v\) with a globally quadratic fit in \(u\):

To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \ldots, \hat \beta_5\) as \[ (\hat \beta_0, \ldots, \hat \beta_5) = \text{argmin}_{\beta_0, \ldots, \beta_5}\sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 u_i^2 + \beta_5 v_i^2))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.

The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0 + \hat \beta_4 u_i^2 + \hat \beta_5 v_i^2\).

Locally quadratic fit in \(v\) with a globally linear fit in \(u\):

To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \ldots, \hat \beta_4\) as \[ \hat \beta_0, \ldots, \hat \beta_4 = \text{argmin}_{\beta_0, \ldots, \beta_4}\sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 v_i^2))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.

The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0+ \hat \beta_4 v_i^2\).

LOESS on ethanol data

load("../../datasets/lattice.RData")
head(ethanol)
##     NOx  C     E
## 1 3.741 12 0.907
## 2 2.295 12 0.761
## 3 1.498 12 1.108
## 4 2.881 12 1.016
## 5 0.760 12 1.189
## 6 3.120  9 1.001
ethanol_lo = loess(NOx ~ C * E, data = ethanol, span = 1/3, parametric = "C", 
    drop.square = "C", family = "symmetric", degree = 2)

Arguments to the loess function:

What do the fitted values look like? First let’s make a coplot of the fitted value given E.

prediction_grid = data.frame(expand.grid(C = c(7.5, 9, 12, 15, 18), E = seq(0.6, 1.2, length.out = 11)))
ethanol_preds = augment(ethanol_lo, newdata = prediction_grid)
ggplot(ethanol_preds) +
    geom_line(aes(x = C, y = .fitted)) +
    facet_wrap(~ E, ncol = 11)  +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    scale_x_continuous(breaks = c(8, 12, 16)) +
    scale_y_continuous("Fitted value of NOx")

Note: The idea behind coplots of fitted values is the same as the idea behind coplots of data. Still have:

We still:

Difference:

Then a coplot of the fitted values given C.

ggplot(ethanol_preds) +
    geom_line(aes(x = E, y = .fitted)) +
    facet_wrap(~ C, ncol = 3) +
    scale_y_continuous("Fitted value of NOx")

More useful without the faceting:

ggplot(ethanol_preds) +
    geom_line(aes(x = E, y = .fitted, color = factor(C, levels = unique(C), ordered = TRUE))) +
    guides(color = guide_legend(title = "C")) +
    scale_y_continuous("Fitted value of NOx")

Plotting residuals

Remember augment from broom: if you just ask it to augment the output from a linear model, it will give back a data frame with the predictor variables used in the model along with residuals and fitted values.

Here we’re looking for structure: systematic relationships between the residuals and the preditor variables. If we see a relationship between the predictors and the residuals, it indicates that the form of the model doesn’t fit well and we need to use something more flexible.

We first look at the residuals by E and C:

ethanol_resid = augment(ethanol_lo)
ggplot(ethanol_resid, aes(x = E, y = .resid)) +
    geom_point() +
    stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
    scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'

ggplot(ethanol_resid, aes(x = C, y = .resid)) +
    geom_point() +
    stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric"))  +
    scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'

Note: using the robust version of loess (family = "symmetric") helps a lot here. If we use the non-robust version, the loess curve is pulled away from zero by the outliers.

ethanol_resid = augment(ethanol_lo)
ggplot(ethanol_resid, aes(x = E, y = .resid)) +
    geom_point() +
    stat_smooth(method = "loess", method.args = list(degree = 1)) +
    scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'

ggplot(ethanol_resid, aes(x = C, y = .resid)) +
    geom_point() +
    stat_smooth(method = "loess", method.args = list(degree = 1)) +
    scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'

It’s also useful to look at coplots:

ggplot(ethanol_resid, aes(x = E, y = .resid)) +
    geom_point() +
    facet_grid(~ C) +
    stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
    scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'

In all of the residual plots, we see outliers, but not any major dependence of the residuals on the predictors.

Residuals to check model assumptions

It’s also a good idea to check on homoscedasticity and normality of the residuals.

To check for homoscedasticity, we plot the absolute value of the residuals by the fitted value:

ggplot(ethanol_resid, aes(x = .fitted, y = abs(.resid))) +
    geom_point() + 
    stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
    scale_y_continuous("Absolute value of NOx residual") +
    scale_x_continuous("NOx fitted value")
## `geom_smooth()` using formula = 'y ~ x'

To check for normality, we make a QQ plot:

ggplot(ethanol_resid) + stat_qq(aes(sample = .resid))

We see that the residuals are quite heavy-tailed. This means:

What we’ve learned

Rubber data

Reading: Cleveland pp. 180-187, 200-213

The data frame rubber in lattice.RData contains three measurements on 30 specimens of tire rubber.

The variables are:

Pairs plot of the three variables

library(GGally)
ggpairs(rubber[,c("hardness", "tensile.strength", "abrasion.loss")])

Coplot of abrasion loss and tensile strength given hardness

coplot_hardness = make_coplot_df(rubber, "hardness", number_bins = 8)
ggplot(coplot_hardness, aes(x = tensile.strength, y = abrasion.loss)) +
    geom_point() +
    facet_wrap(~ interval, ncol = 4) +
    stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

ggplot(coplot_hardness, aes(x = tensile.strength, y = abrasion.loss)) +
    geom_point() +
    facet_wrap(~ interval, ncol = 4) +
    stat_smooth(method = "loess", method.args = list(degree = 1), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

An easier way to make a coplot with ggplot is to use the cut_number function:

ggplot(rubber, aes(x = tensile.strength, y = abrasion.loss)) +
    geom_point() +
    facet_grid(. ~ cut_number(hardness, n = 4)) +
    stat_smooth(method = "loess", method.args = list(degree = 1, span = .9), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

Coplot of abrasion loss and hardness given tensile strength

coplot_ts = make_coplot_df(rubber, "tensile.strength", number_bins = 8)
ggplot(coplot_ts, aes(x = hardness, y = abrasion.loss)) +
    geom_point() +
    facet_wrap(~ interval, ncol = 4) +
    stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

ggplot(coplot_ts, aes(x = hardness, y = abrasion.loss)) +
    geom_point() +
    facet_wrap(~ interval, ncol = 4) +
    stat_smooth(method = "loess", method.args = list(span = .5, degree = 1), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

ggplot(coplot_ts, aes(x = hardness, y = abrasion.loss, color = interval)) +
    stat_smooth(method = "loess", method.args = list(span = .5, degree = 1), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

Exercises

Try making the following additional plots:

Questions:

Building a model

Let’s start off building a model with no interaction but with a non-linear function of tensile.strength.

To do this, we need to:

Deciding on a function

We want our non-linear function to be linear for values of tensile.strength below 180, flat for values above 180, and continuous. One such function is \[ f(x) = \begin{cases} x - 180 & x \le 180\\ 0 & x > 180 \end{cases} \]

Writing the function in R

The way we would write this in R would be

tslow = function(x) {
    return((x - 180) * (x < 180))
}

Applying the function to tensile.strength

And to create a variable corresponding to this transformation of tensile strength, we could use

rubber %>% mutate(ts.low = tslow(tensile.strength))

However, we don’t need to do that because Cleveland has already done it for us (the variable ts.low already exists in the data set and is exactly this function of tensile.strength).

Fitting and visualizing the model

To fit the model:

library(MASS)
rubber.rlm = rlm(abrasion.loss ~ hardness + ts.low, data = rubber, 
    psi = psi.bisquare)

To visualize the fitted model, we need to get fitted values from the model on a grid of values of the two predictors.

library(broom)
rubber.grid = expand.grid(hardness = c(54, 64, 74, 84),
                          tensile.strength = c(144, 162, 180, 198)) %>% data.frame
rubber.grid = rubber.grid %>% mutate(ts.low = tslow(tensile.strength))
rubber.predict = augment(rubber.rlm, newdata = rubber.grid)
rubber.predict
## # A tibble: 16 × 4
##    hardness tensile.strength ts.low .fitted
##       <dbl>            <dbl>  <dbl>   <dbl>
##  1       54              144    -36   357. 
##  2       64              144    -36   289. 
##  3       74              144    -36   222. 
##  4       84              144    -36   154. 
##  5       54              162    -18   298. 
##  6       64              162    -18   230. 
##  7       74              162    -18   162. 
##  8       84              162    -18    94.8
##  9       54              180      0   238. 
## 10       64              180      0   171. 
## 11       74              180      0   103. 
## 12       84              180      0    35.3
## 13       54              198      0   238. 
## 14       64              198      0   171. 
## 15       74              198      0   103. 
## 16       84              198      0    35.3

Once we have the fitted values, we can make a coplot of the fitted model. We’ll start with hardness as the given variable:

ggplot(rubber.predict) +
    geom_line(aes(x = tensile.strength, y = .fitted)) +
    facet_grid(~ hardness) +
    scale_y_continuous("Fitted abrasion loss") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

ggplot(rubber.predict) +
    geom_line(aes(x = tensile.strength, y = .fitted, color = factor(hardness, ordered = TRUE))) +
    guides(color = guide_legend(title = "Hardness")) +
    scale_y_continuous("Fitted abrasion loss") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Note that the first plot is a coplot, the second doesn’t have a name but reports the same information in a different way.

Then a coplot with tensile.strength as the given variable:

ggplot(rubber.predict) +
    geom_line(aes(x = hardness, y = .fitted)) +
    facet_grid(~ tensile.strength) +
    scale_y_continuous("Fitted abrasion loss") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

ggplot(rubber.predict) +
    geom_line(aes(x = hardness, y = .fitted, color = factor(tensile.strength, ordered = TRUE))) +
    guides(color = guide_legend(title = "Tensile strength")) +
    scale_y_continuous("Fitted abrasion loss") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Residuals

rubber.resid = data.frame(rubber, .resid = residuals(rubber.rlm))
ggplot(rubber.resid, aes(x = tensile.strength, y = .resid)) + geom_point() + 
    stat_smooth(method = "loess", span = 1, method.args = list(degree = 1, family = "symmetric")) + 
    geom_abline(slope = 0, intercept = 0) +
    scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'

ggplot(rubber.resid, aes(x = hardness, y = .resid)) + geom_point() + 
    stat_smooth(method = "loess", span = 1, method.args = list(degree = 1, family = "symmetric")) + 
    geom_abline(slope = 0, intercept = 0) +
    scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'

Coplots of the residuals

resid_co_hardness = make_coplot_df(rubber.resid, faceting_variable = "hardness", number_bins = 4)
ggplot(resid_co_hardness, aes(x = tensile.strength, y = .resid)) +
    geom_point() +
    facet_grid(~ interval) +
    stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
    scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'

resid_co_ts = make_coplot_df(rubber.resid, faceting_variable = "tensile.strength", number_bins = 4)
ggplot(resid_co_ts, aes(x = hardness, y = .resid)) +
    geom_point() +
    facet_grid(~ interval) +
    stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
    scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'

Second-round model

From the residual plots, it looks like we might actually do better fitting an interaction between tensile.strength and hardness.

Exercises:

Final plots

coplot_hardness = coplot_hardness %>% mutate(interval_mean = get_interval_means(interval))
rubber.grid.final = expand.grid(hardness = unique(coplot_hardness$interval_mean),
                          tensile.strength = c(125, 180, 240)) %>% data.frame
rubber.grid.final = rubber.grid.final %>% mutate(ts.low = tslow(tensile.strength))
rubber.predict.final = augment(rubber.rlm, newdata = rubber.grid.final)
rubber.predict.final = merge(rubber.predict.final,
                             unique(coplot_hardness[,c("interval", "interval_mean")]),
                             by.x = "hardness", by.y = "interval_mean")

ggplot(coplot_hardness, aes(x = tensile.strength, y = abrasion.loss)) +
    geom_point() +
    geom_line(aes(x = tensile.strength, y = .fitted), data = rubber.predict.final) +
    facet_wrap(~ interval, ncol = 4) +
    scale_y_continuous("Abrasion loss") +
    scale_x_continuous("Tensile strength")