Stat 470/670 Lecture 11: Interactions and coplots for trivariate data

Julia Fukuyama

Trivariate data

Reading: Cleveland pp. 184-190, 194-199, 204-205 (LOESS with more than one predictor variable)

Today: Trivariate data; coplots and interactions

Questions we will want to answer:

Recall interactions in a linear model

Suppose we have a response variable, \(y\), and two predictor variables, \(u\) and \(v\).

The interaction model is \[ y = \beta_0 + \beta_1 u + \beta_2 v + \beta_3 uv + \varepsilon \] with \[ \varepsilon \sim N(0, \sigma^2) \]

What does this mean when one of the variables is binary? If both are continuous?

Coplots

Coplots help us answer the question “How does the relationship between two variables change given the value of a third variable?”

A coplot is defined by three variables:

Coplots with categorical variables

We’ve seen coplots for categorical variables before. Remember our diamonds example from last time, where we were interested in the three variables:

ggplot(diamonds, aes(x = carat, y = sqrt(price))) +
    geom_point(size = .1) +
    stat_smooth(method = "lm", se = FALSE) +
    ylab("Square root of price") +
    facet_wrap(~ clarity)
## `geom_smooth()` using formula = 'y ~ x'

This allows us to answer the question: “How does the relationship between carat and price change with clarity?”

Coplots with continuous variables

We saw this example last time, in the ethanol dataset. We had the variables

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
ggpairs(ethanol)

Important note: The relationship between NOx and C is very weak, but as we will see, C is still important for modeling NOx. Small marginal correlations don’t mean that a variable can be excluded from the model!

Here we could condition easily on C because although it is technically continuous, it only took five distinct values.

We made this plot:

ethanol = ethanol %>% mutate(Cfac = factor(C, levels = sort(unique(C)), ordered = TRUE))
ggplot(ethanol, aes(x = E, y = NOx, color = Cfac)) +
    geom_point() + facet_wrap(~ Cfac) +
    guides(color = guide_legend(title = "C")) +
    stat_smooth(method = "loess", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

in an attempt to answer the question of how the relationship between NOx and E changes given C.

Coplots with truly continuous variables

With continuous variables, instead of taking each facet to represent all the points with a single value of the given variable, we take each facet to represent all the points for which the value of the given variable lies in a certain interval.

Concretely:

How to define the intervals is up to you; there are defaults in R, or you can choose yourself.

Notes:

ggplot2 doesn’t have coplots built in, but we can make them if we work hard enough.

make_coplot_df = function(data_frame, faceting_variable, number_bins = 6) {
    ## co.intervals gets the limits used for the conditioning intervals
    intervals = co.intervals(data_frame[[faceting_variable]], number = number_bins)
    ## indices is a list, with the ith element containing the indices of the
    ## observations falling into the ith interval
    indices = apply(intervals, 1, function(x)
        which(data_frame[[faceting_variable]] <= x[2] & data_frame[[faceting_variable]] >= x[1]))
    ## interval_descriptions is formatted like indices, but has interval
    ## names instead of indices of the samples falling in the index
    interval_descriptions = apply(intervals, 1, function(x) {
        num_in_interval = sum(data_frame[[faceting_variable]] <= x[2] & data_frame[[faceting_variable]] >= x[1])
        interval_description = sprintf("(%.2f, %.2f)", x[1], x[2])
        return(rep(interval_description, num_in_interval))
    })
    ## df_expanded has all the points we need for each interval, and the
    ## 'interval' column tells us which part of the coplot the point should
    ## be plotted in
    df_expanded = data_frame[unlist(indices),]
    df_expanded$interval = factor(unlist(interval_descriptions),
        levels = unique(unlist(interval_descriptions)), ordered = TRUE)
    return(df_expanded)
}

Once we have defined the function to make the expanded data frame, the coplot is simply a faceted plot where we facet by interval.

ethanol_expanded = make_coplot_df(ethanol, "E", 10)
ggplot(ethanol_expanded, aes(y = NOx, x = C)) +
    geom_point() +
    facet_wrap(~ interval, ncol = 6) +
    geom_smooth(method = "loess", se = FALSE, span = 1, method.args = list(degree = 1)) +
    scale_x_continuous(breaks = seq(7, 19, by=3)) +
    ggtitle("Coplot of NOx ~ C given E")
## `geom_smooth()` using formula = 'y ~ x'

Alternately, with the coplot function:

coplot(NOx ~ C | E, data = ethanol, number = 10)

Building a model

Now suppose we want to build a model that describes NOx as a function of E and C.

Our coplots have told us a couple things about the relationship:

Recall: LOESS with one predictor variable

We have a response variable \(y\), a predictor variable \(x\), and \(n\) samples.

To find the value of the LOESS smoother with \(\lambda = 1\) (locally linear fit) at a point \(x_0\), we solve for the coefficients in the weighted regression problem \[ (\hat \beta_0, \hat \beta_1) = \text{argmin}_{ \beta_0, \beta_1} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 x_i))^2, \]

To find the value of the LOESS smoother with \(\lambda = 2\) (locally quadratic fit) at a point \(x_0\), we solve for the coefficients in the weighted regression problem \[ ( \hat \beta_0, \hat \beta_1, \hat \beta_2 ) = \text{argmin}_{ \beta_0, \beta_1, \beta_2} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 x_i + \beta_2 x_i^2))^2 \]

The weights are: \[ w_i(x_0) = T(\Delta_i(x_0) / \Delta_{(q)}(x_0)) \] where \(\Delta_i(x_0) = |x_i - x_0|\), \(\Delta_{(i)}(x_0)\) are the ordered values of \(\Delta_{i}(x_0)\), and \(q = \alpha n\), rounded to the nearest integer.

\(T\) is the tricube weight function: \[ T(u) = \begin{cases} (1 - |u|^3)^3 & |u| \le 1 \\ 0 & |u| > 1 \end{cases} \]

The value of the LOESS smoother at \(x_0\) is the fitted value of the weighted regression defined above evaluated at \(x_0\).

Modifying LOESS for two predictor variables

Now we have a response variable \(y\), a predictor variable \(x = (u, v)\), and \(n\) samples.

The parameters are still:

Suppose \(\lambda = 1\)

To find the value of the LOESS smoother at a point \(x_0 = (u_0, v_0)\), we solve for the coefficients in the weighted regression problem \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2) = \text{argmin}_{ \beta_0, \beta_1, \beta_2} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i ))^2, \]

The value of the LOESS smoother at \(x_0\) is then \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0\).

If \(\lambda = 2\), to find the value of the LOESS smoother at a point \(x_0 = (u_0, v_0)\), we solve for the coefficients in the weighted regression problem \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3, \hat \beta_4, \hat \beta_5) = \text{min}_{ \beta_0, \beta_1, \beta_2, \beta_3, \beta_4, \beta_5} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 u_i^2 + \beta_5 v_i^2))^2, \]

The value of the LOESS smoother at \(x_0\) is then \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_i + \hat \beta_3 u_0 v_0 + \hat \beta_4 u_0^2 + \hat \beta_5 v_0^2\).

Weights for LOESS with two predictors

The weights are: \[ w_i(x_0) = T(\Delta_i(x_0) / \Delta_{(q)}(x_0)) \] with

Since the two predictor variables are potentially on different scales, they are usually normalized using a robust estimate of the spread before the distances are computed. Some options

Cleveland suggests using a 10% trimmed standard deviation as the measure of spread for normalization.

A useful modification of LOESS

What if we think some of the conditional relations are from a parametric family? For example, the dependence of NOx on C seems to always be linear, no matter what value of E we look at.

We can modify LOESS so that it fits some of the conditional relations globally instead of locally.

Let \(\hat g(u,v)\) be our fitted LOESS surface, and suppose we want \(\hat g(u, v)\), seen as a function of \(u\), to be from a parametric family (e.g. linear or quadratic).

To do this, we simply drop the \(u_i\)’s from our distances when computing the weights.

Suppose we want to modify locally linear LOESS in this way. To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3\) as \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3) = \text{argmin}_{\beta_0, \beta_1, \beta_2, \beta_3} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.

The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0\).

This leads to a fit that is locally linear in \(v\) and globally linear in \(u\), with different slopes in \(u\) conditional on different values of \(v\).

To check your understanding: how would the fit change if you didn’t include the \(u_iv_i\) term?

Locally quadratic fit in \(v\) with a globally quadratic fit in \(u\):

To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \ldots, \hat \beta_5\) as \[ (\hat \beta_0, \ldots, \hat \beta_5) = \text{argmin}_{\beta_0, \ldots, \beta_5}\sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 u_i^2 + \beta_5 v_i^2))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.

The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0 + \hat \beta_4 u_i^2 + \hat \beta_5 v_i^2\).

Locally quadratic fit in \(v\) with a globally linear fit in \(u\):

To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \ldots, \hat \beta_4\) as \[ \hat \beta_0, \ldots, \hat \beta_4 = \text{argmin}_{\beta_0, \ldots, \beta_4}\sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 v_i^2))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.

The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0+ \hat \beta_4 v_i^2\).

LOESS function in R

Now we know what the following arguments in the LOESS function do:

LOESS on ethanol data

ethanol_lo = loess(NOx ~ C * E, data = ethanol, span = 1/3, parametric = "C", 
    drop.square = "C", family = "symmetric")

What do the parameters mean here?

What do the fitted values look like? First let’s make a coplot of the fitted value given E.

prediction_grid = data.frame(expand.grid(C = c(7.5, 9, 12, 15, 18), E = seq(0.6, 1.2, by = .05)))
ethanol_preds = augment(ethanol_lo, newdata = prediction_grid)
ggplot(ethanol_preds) +
    geom_line(aes(x = C, y = .fitted)) +
    facet_wrap(~ E, ncol = 7)

Then a coplot of the fitted values given C.

ggplot(ethanol_preds) +
    geom_line(aes(x = E, y = .fitted)) +
    facet_wrap(~ C, ncol = 3)

More useful without the faceting:

ggplot(ethanol_preds) +
    geom_line(aes(x = E, y = .fitted, color = factor(C, levels = unique(C), ordered = TRUE))) +
        guides(color = guide_legend(title = "C"))

Summing up