Julia Fukuyama
Now we have a response variable \(y\), a predictor variable \(x = (u, v)\), and \(n\) samples.
The parameters are still:
Suppose \(\lambda = 1\)
To find the value of the LOESS smoother at a point \(x_0 = (u_0, v_0)\), we solve for the coefficients in the weighted regression problem \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2) = \text{argmin}_{ \beta_0, \beta_1, \beta_2} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i ))^2, \]
The value of the LOESS smoother at \(x_0\) is then \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0\).
If \(\lambda = 2\), to find the value of the LOESS smoother at a point \(x_0 = (u_0, v_0)\), we solve for the coefficients in the weighted regression problem \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3, \hat \beta_4, \hat \beta_5) = \text{min}_{ \beta_0, \beta_1, \beta_2, \beta_3, \beta_4, \beta_5} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 u_i^2 + \beta_5 v_i^2))^2, \]
The value of the LOESS smoother at \(x_0\) is then \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_i + \hat \beta_3 u_0 v_0 + \hat \beta_4 u_0^2 + \hat \beta_5 v_0^2\).
The weights are: \[ w_i(x_0) = T(\Delta_i(x_0) / \Delta_{(q)}(x_0)) \] with
Since the two predictor variables are potentially on different scales, they are usually normalized using a robust estimate of the spread before the distances are computed. Some options
Cleveland suggests using a 10% trimmed standard deviation as the measure of spread for normalization.
What if we think some of the conditional relations are from a parametric family? For example, the dependence of NOx on C seems to always be linear, no matter what value of E we look at.
We can modify LOESS so that it fits some of the conditional relations globally instead of locally.
Let \(\hat g(u,v)\) be our fitted LOESS surface, and suppose we want \(\hat g(u, v)\), seen as a function of \(u\), to be from a parametric family (e.g. linear or quadratic).
To do this, we simply drop the \(u_i\)’s from our distances when computing the weights.
Suppose we want to modify locally linear LOESS in this way. To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3\) as \[ (\hat \beta_0, \hat \beta_1, \hat \beta_2, \hat \beta_3) = \text{argmin}_{\beta_0, \beta_1, \beta_2, \beta_3} \sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.
The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0\).
This leads to a fit that is locally linear in \(v\) and globally linear in \(u\), with different slopes in \(u\) conditional on different values of \(v\).
To check your understanding: how would the fit change if you didn’t include the \(u_iv_i\) term?
Locally quadratic fit in \(v\) with a globally quadratic fit in \(u\):
To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \ldots, \hat \beta_5\) as \[ (\hat \beta_0, \ldots, \hat \beta_5) = \text{argmin}_{\beta_0, \ldots, \beta_5}\sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 u_i^2 + \beta_5 v_i^2))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.
The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0 + \hat \beta_4 u_i^2 + \hat \beta_5 v_i^2\).
Locally quadratic fit in \(v\) with a globally linear fit in \(u\):
To find the value of the LOESS smoother at a point \(x_0\), we find \(\hat \beta_0, \ldots, \hat \beta_4\) as \[ \hat \beta_0, \ldots, \hat \beta_4 = \text{argmin}_{\beta_0, \ldots, \beta_4}\sum_{i=1}^n w_i(x_0) (y_i - ( \beta_0 + \beta_1 u_i + \beta_2 v_i + \beta_3 u_i v_i + \beta_4 v_i^2))^2 \] where the weights \(w_i(x_0)\) only take into account the \(v\) coordinates.
The fitted value of the LOESS smoother at \(x_0\), \(\hat g(x_0) = \hat g(u_0, v_0)\), is then equal to \(\hat \beta_0 + \hat \beta_1 u_0 + \hat \beta_2 v_0 + \hat \beta_3 u_0 v_0+ \hat \beta_4 v_i^2\).
## NOx C E
## 1 3.741 12 0.907
## 2 2.295 12 0.761
## 3 1.498 12 1.108
## 4 2.881 12 1.016
## 5 0.760 12 1.189
## 6 3.120 9 1.001
Arguments to the loess
function:
NOx
as
a function of C
and E
, with an interaction
between the two variables.data
gives tha data frame the variables come from.span
gives the fraction of the observations that have
non-zero weight for each of the local regressions, the same as \(\alpha\) in the text.degree
gives the degree of the local polynomial. If we
have variables \(u\) and \(v\), degree = \(1\) means the local regressions use \(u\) and \(v\) as predictors, and degree = \(2\) means the local regressions will use
\(u\), \(v\), \(uv\), \(u^2\), and \(v^2\) as predictors.family
is either symmetric or gaussian, gaussian means
the local regressions are fit by least squares, while symmetric means
they are fit with robust regression using Tukey’s biweight.parametric
: The names of variables for which we want to
constrain the fit to be parametric. The parametric fit is achieved by
excluding these variables from the distance calculations when deciding
on observation weights in the local regressions.drop.square
: By default, if degree = \(2\) and one of the variables is constrained
to be fit parametrically, the parametric fit will be of a degree \(2\) polynomial. drop.square =
TRUE
changes this so that the parametric fit is linear instead of
quadratic.What do the fitted values look like? First let’s make a coplot of the
fitted value given E
.
prediction_grid = data.frame(expand.grid(C = c(7.5, 9, 12, 15, 18), E = seq(0.6, 1.2, length.out = 11)))
ethanol_preds = augment(ethanol_lo, newdata = prediction_grid)
ggplot(ethanol_preds) +
geom_line(aes(x = C, y = .fitted)) +
facet_wrap(~ E, ncol = 11) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_x_continuous(breaks = c(8, 12, 16)) +
scale_y_continuous("Fitted value of NOx")
Note: The idea behind coplots of fitted values is the same as the idea behind coplots of data. Still have:
We still:
Difference:
Then a coplot of the fitted values given C
.
More useful without the faceting:
Remember augment
from broom
: if you just
ask it to augment the output from a linear model, it will give back a
data frame with the predictor variables used in the model along with
residuals and fitted values.
Here we’re looking for structure: systematic relationships between the residuals and the preditor variables. If we see a relationship between the predictors and the residuals, it indicates that the form of the model doesn’t fit well and we need to use something more flexible.
We first look at the residuals by E
and
C
:
ethanol_resid = augment(ethanol_lo)
ggplot(ethanol_resid, aes(x = E, y = .resid)) +
geom_point() +
stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(ethanol_resid, aes(x = C, y = .resid)) +
geom_point() +
stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'
Note: using the robust version of loess (family =
"symmetric"
) helps a lot here. If we use the non-robust version,
the loess curve is pulled away from zero by the outliers.
ethanol_resid = augment(ethanol_lo)
ggplot(ethanol_resid, aes(x = E, y = .resid)) +
geom_point() +
stat_smooth(method = "loess", method.args = list(degree = 1)) +
scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(ethanol_resid, aes(x = C, y = .resid)) +
geom_point() +
stat_smooth(method = "loess", method.args = list(degree = 1)) +
scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'
It’s also useful to look at coplots:
ggplot(ethanol_resid, aes(x = E, y = .resid)) +
geom_point() +
facet_grid(~ C) +
stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
scale_y_continuous("NOx Residual")
## `geom_smooth()` using formula = 'y ~ x'
In all of the residual plots, we see outliers, but not any major dependence of the residuals on the predictors.
It’s also a good idea to check on homoscedasticity and normality of the residuals.
To check for homoscedasticity, we plot the absolute value of the residuals by the fitted value:
ggplot(ethanol_resid, aes(x = .fitted, y = abs(.resid))) +
geom_point() +
stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
scale_y_continuous("Absolute value of NOx residual") +
scale_x_continuous("NOx fitted value")
## `geom_smooth()` using formula = 'y ~ x'
To check for normality, we make a QQ plot:
We see that the residuals are quite heavy-tailed. This means:
Reading: Cleveland pp. 180-187, 200-213
The data frame rubber in lattice.RData
contains three
measurements on 30 specimens of tire rubber.
The variables are:
hardness
: How much the rubber rebounds after being
indented.tensile.strength
: The force per cross-sectional area
required to break the rubber (in kg/cm2).abrasion.loss
: The amount of material lost to abrasion
when rubbing it per unit energy (in grams per hp-hour). This gives you
an idea how fast the tire will wear away when you drive. If we had to
choose a “response” variable, it would be this one.coplot_hardness = make_coplot_df(rubber, "hardness", number_bins = 8)
ggplot(coplot_hardness, aes(x = tensile.strength, y = abrasion.loss)) +
geom_point() +
facet_wrap(~ interval, ncol = 4) +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
ggplot(coplot_hardness, aes(x = tensile.strength, y = abrasion.loss)) +
geom_point() +
facet_wrap(~ interval, ncol = 4) +
stat_smooth(method = "loess", method.args = list(degree = 1), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
An easier way to make a coplot with ggplot is to use the
cut_number
function:
ggplot(rubber, aes(x = tensile.strength, y = abrasion.loss)) +
geom_point() +
facet_grid(. ~ cut_number(hardness, n = 4)) +
stat_smooth(method = "loess", method.args = list(degree = 1, span = .9), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
coplot_ts = make_coplot_df(rubber, "tensile.strength", number_bins = 8)
ggplot(coplot_ts, aes(x = hardness, y = abrasion.loss)) +
geom_point() +
facet_wrap(~ interval, ncol = 4) +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
ggplot(coplot_ts, aes(x = hardness, y = abrasion.loss)) +
geom_point() +
facet_wrap(~ interval, ncol = 4) +
stat_smooth(method = "loess", method.args = list(span = .5, degree = 1), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
ggplot(coplot_ts, aes(x = hardness, y = abrasion.loss, color = interval)) +
stat_smooth(method = "loess", method.args = list(span = .5, degree = 1), se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
Try making the following additional plots:
coplot_ts
data frame as on the previous slide, make a plot
with one smoother per tensile.strength
interval, i.e., the
same smoothers as in the previous plot, but not faceted out. Let color
indicate which tensile.strength
interval the smoother
corresponds to.Questions:
tensile.strength
and hardness
? Do you think
that we need to fit an interaction?hardness
and given
tensile.strength
, do you think a linear fit is sufficient
for predicting abrasion.loss
, or do we need to use a
non-linear function?Let’s start off building a model with no interaction but with a
non-linear function of tensile.strength
.
To do this, we need to:
tensile.strength
.We want our non-linear function to be linear for values of
tensile.strength
below 180, flat for values above 180, and
continuous. One such function is \[
f(x) = \begin{cases}
x - 180 & x \le 180\\
0 & x > 180
\end{cases}
\]
The way we would write this in R would be
tensile.strength
And to create a variable corresponding to this transformation of tensile strength, we could use
However, we don’t need to do that because Cleveland has already done
it for us (the variable ts.low
already exists in the data
set and is exactly this function of tensile.strength
).
To fit the model:
To visualize the fitted model, we need to get fitted values from the model on a grid of values of the two predictors.
library(broom)
rubber.grid = expand.grid(hardness = c(54, 64, 74, 84),
tensile.strength = c(144, 162, 180, 198)) %>% data.frame
rubber.grid = rubber.grid %>% mutate(ts.low = tslow(tensile.strength))
rubber.predict = augment(rubber.rlm, newdata = rubber.grid)
rubber.predict
## # A tibble: 16 × 4
## hardness tensile.strength ts.low .fitted
## <dbl> <dbl> <dbl> <dbl>
## 1 54 144 -36 357.
## 2 64 144 -36 289.
## 3 74 144 -36 222.
## 4 84 144 -36 154.
## 5 54 162 -18 298.
## 6 64 162 -18 230.
## 7 74 162 -18 162.
## 8 84 162 -18 94.8
## 9 54 180 0 238.
## 10 64 180 0 171.
## 11 74 180 0 103.
## 12 84 180 0 35.3
## 13 54 198 0 238.
## 14 64 198 0 171.
## 15 74 198 0 103.
## 16 84 198 0 35.3
Once we have the fitted values, we can make a coplot of the fitted
model. We’ll start with hardness
as the given variable:
ggplot(rubber.predict) +
geom_line(aes(x = tensile.strength, y = .fitted)) +
facet_grid(~ hardness) +
scale_y_continuous("Fitted abrasion loss") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
ggplot(rubber.predict) +
geom_line(aes(x = tensile.strength, y = .fitted, color = factor(hardness, ordered = TRUE))) +
guides(color = guide_legend(title = "Hardness")) +
scale_y_continuous("Fitted abrasion loss") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Note that the first plot is a coplot, the second doesn’t have a name but reports the same information in a different way.
Then a coplot with tensile.strength
as the given
variable:
rubber.resid = data.frame(rubber, .resid = residuals(rubber.rlm))
ggplot(rubber.resid, aes(x = tensile.strength, y = .resid)) + geom_point() +
stat_smooth(method = "loess", span = 1, method.args = list(degree = 1, family = "symmetric")) +
geom_abline(slope = 0, intercept = 0) +
scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(rubber.resid, aes(x = hardness, y = .resid)) + geom_point() +
stat_smooth(method = "loess", span = 1, method.args = list(degree = 1, family = "symmetric")) +
geom_abline(slope = 0, intercept = 0) +
scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'
resid_co_hardness = make_coplot_df(rubber.resid, faceting_variable = "hardness", number_bins = 4)
ggplot(resid_co_hardness, aes(x = tensile.strength, y = .resid)) +
geom_point() +
facet_grid(~ interval) +
stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'
resid_co_ts = make_coplot_df(rubber.resid, faceting_variable = "tensile.strength", number_bins = 4)
ggplot(resid_co_ts, aes(x = hardness, y = .resid)) +
geom_point() +
facet_grid(~ interval) +
stat_smooth(method = "loess", method.args = list(degree = 1, family = "symmetric")) +
scale_y_continuous("Abrasion loss residuals")
## `geom_smooth()` using formula = 'y ~ x'
From the residual plots, it looks like we might actually do better
fitting an interaction between tensile.strength
and
hardness.
Exercises:
abrasion.loss
using
an interaction between our non-linear transformation of
tensile.strength
and hardness
(i.e., change
abrasion.loss ~ ts.low + hardness
to abrasion.loss ~
ts.low * hardness
).coplot_hardness = coplot_hardness %>% mutate(interval_mean = get_interval_means(interval))
rubber.grid.final = expand.grid(hardness = unique(coplot_hardness$interval_mean),
tensile.strength = c(125, 180, 240)) %>% data.frame
rubber.grid.final = rubber.grid.final %>% mutate(ts.low = tslow(tensile.strength))
rubber.predict.final = augment(rubber.rlm, newdata = rubber.grid.final)
rubber.predict.final = merge(rubber.predict.final,
unique(coplot_hardness[,c("interval", "interval_mean")]),
by.x = "hardness", by.y = "interval_mean")
ggplot(coplot_hardness, aes(x = tensile.strength, y = abrasion.loss)) +
geom_point() +
geom_line(aes(x = tensile.strength, y = .fitted), data = rubber.predict.final) +
facet_wrap(~ interval, ncol = 4) +
scale_y_continuous("Abrasion loss") +
scale_x_continuous("Tensile strength")