Reading: Matloff, Chapter 9
Agenda for today:
Objects and classes in R
Continue the bootstrap example
Two main concepts, classes and methods.
A class defines a type of object: what sort of data is associated with it, what are valid values for the data, what methods can be called on it.
Every object is an instance of a class.
A method is a function associated with a particular type of object.
S3: Very informal versions of classes, lists with a “class” attribute, allowing for one-argument method dispatch.
S4: Formal classes, class definitions, methods, considered safer but with more overhead.
Both types support generic functions: functions that have different behavior depending on the class of the arguments passed to them.
Everything in R is an object
Every object has one and only one type
Types describe the basic structure of an object
Atomic types are logical, numeric (either integer or numeric), complex, character
Non-atomic types: lists, functions, other special objects for the language
Objects can have more than one class
Classes are primarily for function dispatch
By convention, if an object doesn’t have a class set explicitly, the type is the class
## [1] "integer"
## [1] "integer"
## [1] "character"
## [1] "character"
## $x
## [1] 1 2 3 4
##
## $a
## [1] "a" "b" "c" "d"
## [1] "list"
## [1] "list"
## [1] "data.frame"
## [1] "list"
## x a
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
An S3 class is a an object (either an atomic type or a list) with an extra class attribute.
Class instantiation is by creating an object and setting the class attribute.
Object orientation is only through generic functions.
For example:
## $names
## [1] "name" "salary" "union"
##
## $class
## [1] "employee"
## $name
## [1] "Joe"
##
## $salary
## [1] 55000
##
## $union
## [1] TRUE
##
## attr(,"class")
## [1] "employee"
A generic function is a function whose behavior depends on what class its arguments are.
We have seen functions like print
, plot
, summary
, all of which are generic.
Generic functions are very simple: usually just a call to UseMethod
## function (x, ...)
## UseMethod("print")
## <bytecode: 0x7f9d65a09db0>
## <environment: namespace:base>
## function (x, y, ...)
## UseMethod("plot")
## <bytecode: 0x7f9d669d4cc8>
## <environment: namespace:graphics>
## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x7f9d664918d8>
## <environment: namespace:base>
Methods:
A method is the function that is actually called on a specific class.
In S3, a method for a given class and generic function is just a function with name generic.class
.
For example: print.lm
, plot.lm
are the methods for the lm
class associated with the print
and plot
generic functions.
To see that they are really just normal functions, try typing stats:::plot.lm
or stats:::print.lm
in your R console.
To define a method associated with a class you have created, define a function with name generic.class
Method dispatch:
Method dispatch refers to how R decides what method to use when a generic functions is called.
The UseMethod
function is what does this for S3, and it just works by name matching.
Example:
print.employee = function(x) {
cat(x$name, "\n")
cat("salary", x$salary, "\n")
cat("union member", x$union, "\n")
}
print(joe)
## Joe
## salary 55000
## union member TRUE
## Joe
## salary 55000
## union member TRUE
Inheritance:
We can make new classes that are specialized versions of the old ones.
They inherit all the methods of the old class.
S3 inheritance is implemented as the class attribute taking multiple values.
Classes earlier in the class vector are interpreted as being sub-classes of clasess later in the class vector.
k = list(name="Kate", salary= 68000, union=F, hrsthismonth= 2)
class(k) = c("hrlyemployee","employee")
What happened?
First look for a print.hrlyemployee
function.
That doesn’t exist, so look for print.employee
.
S4 classes have three properties:
A class name
A representation: A list of slots giving names and classes for the objects associated with the class.
A vector of classes it inherits from
Syntax: setClass(class_name, class_representation, contains)
class_name
is the name of the class.
class_representation
is a list decribing the slots and their types.
contains
describes the inheritance.
For example:
Note: This function breaks one of our rules from the beginning: it’s called for its side effect. It assigns an object defining the class, and also returns invisibly a class generation function.
## character(0)
setClass("employee",
representation(
name = "character",
salary = "numeric",
union = "logical"))
ls(all.names = TRUE)
## [1] ".__C__employee"
## Class "employee" [in ".GlobalEnv"]
##
## Slots:
##
## Name: name salary union
## Class: character numeric logical
Don’t use setClass
this way: it’s just to show you that the method returns a class creation function.
class_creation_fn = setClass("employee",
representation(
name = "character",
salary = "numeric",
union = "logical"))
jane = class_creation_fn(name = "Jane", salary = 55000, union = FALSE)
jane
## An object of class "employee"
## Slot "name":
## [1] "Jane"
##
## Slot "salary":
## [1] 55000
##
## Slot "union":
## [1] FALSE
To make an object of a given S4 class, use new
Syntax: new(class, representation)
## An object of class "employee"
## Slot "name":
## [1] "Joe"
##
## Slot "salary":
## [1] 55000
##
## Slot "union":
## [1] TRUE
Slot access is with @
, not $
: object@slot
will give the data associated with slot
in object
.
## [1] 55000
## Error in joe$salary: $ operator not defined for this S4 class
Remember:
A generic function is a function whose behavior depends on the class of its arguments.
A method is the function associated with a specific combination of generic function and argument classes.
Syntax for setting a method associated with a generic function: setMethod(generic, signature, fn)
generic
is a string specifying the generic function for which we want to specify a class-specific method.
signature
describes the classes of the arguments.
fn
is the function we want to use for that specified combination of generic function and argument classes.
For example: show
is a generic function used to print S4 objects.
We can create a method associated with the show
generic function and the employee
S4 class as follows:
setMethod("show", signature = signature("employee"), definition = function(object) {
inorout = ifelse(object@union, "is", "is not")
cat(object@name, "has a salary of", object@salary, "and", inorout, "in the union", "\n")
})
show(joe)
## Joe has a salary of 55000 and is in the union
## Joe has a salary of 55000 and is in the union
Remember our bootstrap example from last time?
Now, in addition to computing confidence intervals, we want to plot the bootstrap sampling distributions.
Last time we settled on the following set of functions.
bootstrap_ci = function(data, estimator, alpha, B) {
boot_estimates = get_boot_estimates(data, estimator, B)
boot_ci = get_ci(boot_estimates, alpha)
return(boot_ci)
}
get_boot_estimates = function(data, estimator, B) {
boot_estimates = replicate(B, expr = {
boot_data = get_bootstrap_sample(data)
boot_estimate = estimator(boot_data)
return(boot_estimate)
})
return(boot_estimates)
}
get_ci = function(estimates, alpha) {
ci_lo = alpha / 2
ci_hi = 1 - (alpha / 2)
if(!is.null(dim(estimates))) {
## if we have multi-dimensional estimates
cis = plyr::aaply(estimates, 1, function(x) quantile(x, probs = c(ci_lo, ci_hi)))
} else {
## if we have one-dimensional estimates
cis = quantile(estimates, probs = c(ci_lo, ci_hi))
}
return(cis)
}
get_bootstrap_sample = function(data) {
if(!is.null(dim(data))) {
## in this case, data is rectangular, and we want to sample rows
n = dim(data)[1]
boot_idx = sample(1:n, size = n, replace = TRUE)
bootstrap_sample = data[boot_idx,]
} else {
## in this case, data is a vector and we want to sample elements of the vector
n = length(data)
boot_idx = sample(1:n, size = n, replace = TRUE)
bootstrap_sample = data[boot_idx]
}
return(bootstrap_sample)
}
Notice that the output from get_boot_estimates
could be used for a lot of different tasks
Last time we used it to get bootstrap confidence intervals
We could also use it to get bootstrap standard errors, or make plots, or do anything else that depends on the set of bootstrap samples.
Let’s try making it into its own class and setting some methods for common operations.
First step: Modify the function so it returns something with a class
attribute.
Second step: Create methods for the boot_dist
class associated with the plot
and print
generic functions.
## Warning: package 'ggplot2' was built under R version 3.5.2
plot.boot_dist = function(x) {
ggplot(data.frame(boot_samples = as.vector(x))) +
geom_histogram(aes(x = boot_samples)) +
ggtitle("Bootstrap distribution")
}
print.boot_dist = function(x) {
n = length(x)
cat("Bootstrap distribution object,", n, "bootstrap samples\n")
cat("Bootstrap standard error:", sd(x), "\n")
}
Check whether it works:
## Bootstrap distribution object, 10000 bootstrap samples
## Bootstrap standard error: 0.2727863
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The functions above only work if we’re getting bootstrap distributions for one parameter at a time.
data(iris)
iris_coef_estimator = function(d) {
iris_lm = lm(Sepal.Length ~ Sepal.Width + Petal.Length, data = d)
iris_coef = coef(iris_lm)
return(iris_coef)
}
boot_dist = get_boot_estimates(iris, iris_coef_estimator, B = 1000)
For the next example, we’ll both fix this problem and show how you would use S4 classes instead of S3 classes.
First step: set an S4 class for the bootstrap distribution.
setClass("boot_dist",
representation = list(boot_samples = "matrix", nparams = "numeric", nboot = "numeric"))
ls(all.names = TRUE)
## [1] ".__C__boot_dist" ".__C__employee" ".__T__show:methods"
## [4] ".Random.seed" "boot_dist" "bootstrap_ci"
## [7] "class_creation_fn" "get_boot_estimates" "get_bootstrap_sample"
## [10] "get_ci" "iris" "iris_coef_estimator"
## [13] "jane" "joe" "plot.boot_dist"
## [16] "print.boot_dist"
Then we modify the get_boot_estimates
function to return an object of the boot_dist
class.
## takes either a vector or a matrix and creates a boot_dist object
make_bd_object <- function(estimates) {
if(is.null(dim(estimates))) { ## if estimates is a vector
nparams = 1
nboot = length(estimates)
estimates = matrix(estimates, nrow = 1)
} else { ## if estimates is a matrix
nparams = nrow(estimates)
nboot = ncol(estimates)
}
bd = new("boot_dist", boot_samples = estimates, nparams = nparams, nboot = nboot)
return(bd)
}
Next step: set method corresponding to the show
generic:
Set method corresponding to the plot
generic:
setMethod("plot", signature = "boot_dist", function(x) {
melted_samples = reshape2::melt(x@boot_samples)
if(x@nparams == 1) {
ggplot(melted_samples) +
geom_histogram(aes(x = value)) +
ggtitle("Bootstrap distribution")
} else {
ggplot(melted_samples) +
geom_histogram(aes(x = value)) +
facet_wrap(~ Var1, scales = "free") +
ggtitle("Bootstrap distributions for each parameter")
}
})
And finally see whether it works:
## Bootstrap distribution object, 100 bootstrap samples
## Number of parameters: 1
## Bootstrap estimate of standard error: 0.2357503
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
And for multiple parameters:
## Bootstrap distribution object, 1000 bootstrap samples
## Number of parameters: 3
## Bootstrap estimate of standard error: 0.2402435 0.06677442 0.01721356
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.