Neural nets/Deep learning

Agenda today:

Reading: Elements of Statistical Learning, Chapter 11.3-11.8

Example: zip code data

Goal: Given images representing digits, classify them correctly.

Input data, \(x_i\), are \(16 \times 16\) grayscale images, represented as vectors in \(\mathbb R^{256}\)

Responses \(y_i\) give the digit in the image.

Encode this as a classification problem, use neural nets with different architectures to fit

Some net architectures

All cases: 10 output units, corresponding to the 10 possible digits. In all cases the output unit is sigmoidal.

Idea behind weight constraints: Each unit computes the same functional of the previous layer, so they are extracting the same features from different parts of the image. A net with this sort of weight sharing is referred to as a convolutional network.

If you want to play with this in R

Example: the same zip code data

## if you want to do this you'll have to install some the python version of keras first, which requires you to have TensorFlow, CNTK, or Theano installed as well
library(keras)
mnist = dataset_mnist()
x_train = mnist$train$x
y_train = mnist$train$y
y_train_matrix = to_categorical(y_train, num_classes = 10)
x_test = mnist$test$x
y_test = mnist$test$y

Let's look at some of the images:

## function to rearrange things so that we can plot them
flip_image = function(x) {
    n = nrow(x)
    return(t(x[n:1,]))
}
par(mfrow = c(3,3))
for(i in 1:9) {
    image(flip_image(x_train[i,,]), col = topo.colors(100), axes = FALSE,
          main = y_train[i])
}
model = keras_model_sequential()
model %>%
  layer_flatten(input_shape = c(28, 28)) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 10, activation = 'softmax')
model %>% compile(
    optimizer = 'adam', 
    loss = loss_categorical_crossentropy,
    metrics = 'accuracy'
)
model
## Model
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## flatten (Flatten)                (None, 784)                   0           
## ___________________________________________________________________________
## dense (Dense)                    (None, 128)                   100480      
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 10)                    1290        
## ===========================================================================
## Total params: 101,770
## Trainable params: 101,770
## Non-trainable params: 0
## ___________________________________________________________________________
## number of parameters for the first layer: each hidden unit has a weight associated with each of the 784 predictor units, plus a bias term
(784 + 1) * 128
## [1] 100480
## number of parameters for the second layer: each output unit has a weight associated with each of the 128 hidden units, plus a bias term
(128 + 1)* 10
## [1] 1290

Fit the model, look at the predictions:

model %>% fit(x = x_train, y = y_train_matrix, epochs = 15)
test_predictions = model %>% predict_classes(x_test)
par(mfrow = c(3,3))
for(i in 1:9) {
    image(flip_image(x_test[i,,]), col = topo.colors(100), axes = FALSE,
          main = sprintf("True digit: %i, Prediction: %i", y_test[i], test_predictions[i]))
}

Summing up