Neural fashion switch with keen execution and Keras

0
119
Neural fashion switch with keen execution and Keras


How would your summer season vacation’s photographs look had Edvard Munch painted them? (Perhaps it’s higher to not know).
Let’s take a extra comforting instance: How would a pleasant, summarly river panorama look if painted by Katsushika Hokusai?

Style switch on photographs isn’t new, however obtained a lift when Gatys, Ecker, and Bethge(Gatys, Ecker, and Bethge 2015) confirmed learn how to efficiently do it with deep studying.
The essential concept is easy: Create a hybrid that could be a tradeoff between the content material picture we need to manipulate, and a fashion picture we need to imitate, by optimizing for maximal resemblance to each on the identical time.

If you’ve learn the chapter on neural fashion switch from Deep Learning with R, chances are you’ll acknowledge a number of the code snippets that comply with.
However, there is a crucial distinction: This publish makes use of TensorFlow Eager Execution, permitting for an crucial approach of coding that makes it straightforward to map ideas to code.
Just like earlier posts on keen execution on this weblog, it is a port of a Google Colaboratory pocket book that performs the identical process in Python.

As traditional, please ensure you have the required bundle variations put in. And no want to repeat the snippets – you’ll discover the whole code among the many Keras examples.

Prerequisites

The code on this publish depends upon the newest variations of a number of of the TensorFlow R packages. You can set up these packages as follows:

set up.packages(c("tensorflow", "keras", "tfdatasets"))

You also needs to ensure that you might be working the very newest model of TensorFlow (v1.10), which you’ll be able to set up like so:

library(tensorflow)
install_tensorflow()

There are extra necessities for utilizing TensorFlow keen execution. First, we have to name tfe_enable_eager_execution() proper initially of this system. Second, we have to use the implementation of Keras included in TensorFlow, quite than the bottom Keras implementation.

Prerequisites behind us, let’s get began!

Input photographs

Here is our content material picture – change by a picture of your personal:

# If you could have sufficient reminiscence in your GPU, no must load the pictures
# at such small measurement.
# This is the scale I discovered working for a 4G GPU.
img_shape <- c(128, 128, 3)

content_path <- "isar.jpg"

content_image <-  image_load(content_path, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

And right here’s the fashion mannequin, Hokusai’s The Great Wave off Kanagawa, which you’ll be able to obtain from Wikimedia Commons:

style_path <- "The_Great_Wave_off_Kanagawa.jpg"

style_image <-  image_load(content_path, target_size = img_shape[1:2])
style_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

We create a wrapper that masses and preprocesses the enter photographs for us.
As we might be working with VGG19, a community that has been skilled on ImageNet, we have to rework our enter photographs in the identical approach that was used coaching it. Later, we’ll apply the inverse transformation to our mixture picture earlier than displaying it.

load_and_preprocess_image <- operate(path) {
  img <- image_load(path, target_size = img_shape[1:2]) %>%
    image_to_array() %>%
    k_expand_dims(axis = 1) %>%
    imagenet_preprocess_input()
}

deprocess_image <- operate(x) {
  x <- x[1, , ,]
  # Remove zero-center by imply pixel
  x[, , 1] <- x[, , 1] + 103.939
  x[, , 2] <- x[, , 2] + 116.779
  x[, , 3] <- x[, , 3] + 123.68
  # 'BGR'->'RGB'
  x <- x[, , c(3, 2, 1)]
  x[x > 255] <- 255
  x[x < 0] <- 0
  x[] <- as.integer(x) / 255
  x
}

Setting the scene

We are going to make use of a neural community, however we gained’t be coaching it. Neural fashion switch is a bit unusual in that we don’t optimize the community’s weights, however again propagate the loss to the enter layer (the picture), so as to transfer it within the desired path.

We might be serious about two sorts of outputs from the community, akin to our two objectives.
Firstly, we need to preserve the mixture picture just like the content material picture, on a excessive stage. In a convnet, higher layers map to extra holistic ideas, so we’re selecting a layer excessive up within the graph to match outputs from the supply and the mixture.

Secondly, the generated picture ought to “look like” the fashion picture. Style corresponds to decrease stage options like texture, shapes, strokes… So to match the mixture towards the fashion instance, we select a set of decrease stage conv blocks for comparability and mixture the outcomes.

content_layers <- c("block5_conv2")
style_layers <- c("block1_conv1",
                 "block2_conv1",
                 "block3_conv1",
                 "block4_conv1",
                 "block5_conv1")

num_content_layers <- size(content_layers)
num_style_layers <- size(style_layers)

get_model <- operate() {
  vgg <- application_vgg19(include_top = FALSE, weights = "imagenet")
  vgg$trainable <- FALSE
  style_outputs <- map(style_layers, operate(layer) vgg$get_layer(layer)$output)
  content_outputs <- map(content_layers, operate(layer) vgg$get_layer(layer)$output)
  model_outputs <- c(style_outputs, content_outputs)
  keras_model(vgg$enter, model_outputs)
}

Losses

When optimizing the enter picture, we are going to take into account three forms of losses. Firstly, the content material loss: How completely different is the mixture picture from the supply? Here, we’re utilizing the sum of the squared errors for comparability.

content_loss <- operate(content_image, goal) {
  k_sum(k_square(goal - content_image))
}

Our second concern is having the types match as carefully as attainable. Style is often operationalized because the Gram matrix of flattened characteristic maps in a layer. We thus assume that fashion is said to how maps in a layer correlate with different.

We subsequently compute the Gram matrices of the layers we’re serious about (outlined above), for the supply picture in addition to the optimization candidate, and examine them, once more utilizing the sum of squared errors.

gram_matrix <- operate(x) {
  options <- k_batch_flatten(k_permute_dimensions(x, c(3, 1, 2)))
  gram <- k_dot(options, k_transpose(options))
  gram
}

style_loss <- operate(gram_target, mixture) {
  gram_comb <- gram_matrix(mixture)
  k_sum(k_square(gram_target - gram_comb)) /
    (4 * (img_shape[3] ^ 2) * (img_shape[1] * img_shape[2]) ^ 2)
}

Thirdly, we don’t need the mixture picture to look overly pixelated, thus we’re including in a regularization element, the overall variation within the picture:

total_variation_loss <- operate(picture) {
  y_ij  <- picture[1:(img_shape[1] - 1L), 1:(img_shape[2] - 1L),]
  y_i1j <- picture[2:(img_shape[1]), 1:(img_shape[2] - 1L),]
  y_ij1 <- picture[1:(img_shape[1] - 1L), 2:(img_shape[2]),]
  a <- k_square(y_ij - y_i1j)
  b <- k_square(y_ij - y_ij1)
  k_sum(k_pow(a + b, 1.25))
}

The difficult factor is learn how to mix these losses. We’ve reached acceptable outcomes with the next weightings, however be at liberty to mess around as you see match:

content_weight <- 100
style_weight <- 0.8
total_variation_weight <- 0.01

Get mannequin outputs for the content material and elegance photographs

We want the mannequin’s output for the content material and elegance photographs, however right here it suffices to do that simply as soon as.
We concatenate each photographs alongside the batch dimension, move that enter to the mannequin, and get again a listing of outputs, the place each aspect of the listing is a 4-d tensor. For the fashion picture, we’re within the fashion outputs at batch place 1, whereas for the content material picture, we want the content material output at batch place 2.

In the under feedback, please notice that the sizes of dimensions 2 and three will differ in case you’re loading photographs at a unique measurement.

get_feature_representations <-
  operate(mannequin, content_path, style_path) {
    
    # dim == (1, 128, 128, 3)
    style_image <-
      load_and_process_image(style_path) %>% k_cast("float32")
    # dim == (1, 128, 128, 3)
    content_image <-
      load_and_process_image(content_path) %>% k_cast("float32")
    # dim == (2, 128, 128, 3)
    stack_images <- k_concatenate(listing(style_image, content_image), axis = 1)
    
    # size(model_outputs) == 6
    # dim(model_outputs[[1]]) = (2, 128, 128, 64)
    # dim(model_outputs[[6]]) = (2, 8, 8, 512)
    model_outputs <- mannequin(stack_images)
    
    style_features <- 
      model_outputs[1:num_style_layers] %>%
      map(operate(batch) batch[1, , , ])
    content_features <- 
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)] %>%
      map(operate(batch) batch[2, , , ])
    
    listing(style_features, content_features)
  }

Computing the losses

On each iteration, we have to move the mixture picture via the mannequin, receive the fashion and content material outputs, and compute the losses. Again, the code is extensively commented with tensor sizes for straightforward verification, however please remember that the precise numbers presuppose you’re working with 128×128 photographs.

compute_loss <-
  operate(mannequin, loss_weights, init_image, gram_style_features, content_features) {
    
    c(style_weight, content_weight) %<-% loss_weights
    model_outputs <- mannequin(init_image)
    style_output_features <- model_outputs[1:num_style_layers]
    content_output_features <-
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)]
    
    # fashion loss
    weight_per_style_layer <- 1 / num_style_layers
    style_score <- 0
    # dim(style_zip[[5]][[1]]) == (512, 512)
    style_zip <- transpose(listing(gram_style_features, style_output_features))
    for (l in 1:size(style_zip)) {
      # for l == 1:
      # dim(target_style) == (64, 64)
      # dim(comb_style) == (1, 128, 128, 64)
      c(target_style, comb_style) %<-% style_zip[[l]]
      style_score <- style_score + weight_per_style_layer * 
        style_loss(target_style, comb_style[1, , , ])
    }
    
    # content material loss
    weight_per_content_layer <- 1 / num_content_layers
    content_score <- 0
    content_zip <- transpose(listing(content_features, content_output_features))
    for (l in 1:size(content_zip)) {
      # dim(comb_content) ==  (1, 8, 8, 512)
      # dim(target_content) == (8, 8, 512)
      c(target_content, comb_content) %<-% content_zip[[l]]
      content_score <- content_score + weight_per_content_layer *
        content_loss(comb_content[1, , , ], target_content)
    }
    
    # whole variation loss
    variation_loss <- total_variation_loss(init_image[1, , ,])
    
    style_score <- style_score * style_weight
    content_score <- content_score * content_weight
    variation_score <- variation_loss * total_variation_weight
    
    loss <- style_score + content_score + variation_score
    listing(loss, style_score, content_score, variation_score)
  }

Computing the gradients

As quickly as we now have the losses, acquiring the gradients of the general loss with respect to the enter picture is only a matter of calling tape$gradient on the GradientTape. Note that the nested name to compute_loss, and thus the decision of the mannequin on our mixture picture, occurs contained in the GradientTape context.

compute_grads <- 
  operate(mannequin, loss_weights, init_image, gram_style_features, content_features) {
    with(tf$GradientTape() %as% tape, {
      scores <-
        compute_loss(mannequin,
                     loss_weights,
                     init_image,
                     gram_style_features,
                     content_features)
    })
    total_loss <- scores[[1]]
    listing(tape$gradient(total_loss, init_image), scores)
  }

Training section

Now it’s time to coach! While the pure continuation of this sentence would have been “… the model,” the mannequin we’re coaching right here isn’t VGG19 (that one we’re simply utilizing as a instrument), however a minimal setup of simply:

  • a Variable that holds our to-be-optimized picture
  • the loss features we outlined above
  • an optimizer that may apply the calculated gradients to the picture variable (tf$practice$AdamOptimizer)

Below, we get the fashion options (of the fashion picture) and the content material characteristic (of the content material picture) simply as soon as, then iterate over the optimization course of, saving the output each 100 iterations.

In distinction to the unique article and the Deep Learning with R e book, however following the Google pocket book as a substitute, we’re not utilizing L-BFGS for optimization, however Adam, as our aim right here is to offer a concise introduction to keen execution.
However, you would plug in one other optimization methodology in case you needed, changing
optimizer$apply_gradients(listing(tuple(grads, init_image)))
by an algorithm of your alternative (and naturally, assigning the results of the optimization to the Variable holding the picture).

run_style_transfer <- operate(content_path, style_path) {
  mannequin <- get_model()
  stroll(mannequin$layers, operate(layer) layer$trainable = FALSE)
  
  c(style_features, content_features) %<-% 
    get_feature_representations(mannequin, content_path, style_path)
  # dim(gram_style_features[[1]]) == (64, 64)
  gram_style_features <- map(style_features, operate(characteristic) gram_matrix(characteristic))
  
  init_image <- load_and_process_image(content_path)
  init_image <- tf$contrib$keen$Variable(init_image, dtype = "float32")
  
  optimizer <- tf$practice$AdamOptimizer(learning_rate = 1,
                                      beta1 = 0.99,
                                      epsilon = 1e-1)
  
  c(best_loss, best_image) %<-% listing(Inf, NULL)
  loss_weights <- listing(style_weight, content_weight)
  
  start_time <- Sys.time()
  global_start <- Sys.time()
  
  norm_means <- c(103.939, 116.779, 123.68)
  min_vals <- -norm_means
  max_vals <- 255 - norm_means
  
  for (i in seq_len(num_iterations)) {
    # dim(grads) == (1, 128, 128, 3)
    c(grads, all_losses) %<-% compute_grads(mannequin,
                                            loss_weights,
                                            init_image,
                                            gram_style_features,
                                            content_features)
    c(loss, style_score, content_score, variation_score) %<-% all_losses
    optimizer$apply_gradients(listing(tuple(grads, init_image)))
    clipped <- tf$clip_by_value(init_image, min_vals, max_vals)
    init_image$assign(clipped)
    
    end_time <- Sys.time()
    
    if (k_cast_to_floatx(loss) < best_loss) {
      best_loss <- k_cast_to_floatx(loss)
      best_image <- init_image
    }
    
    if (i %% 50 == 0) {
      glue("Iteration: {i}") %>% print()
      glue(
        "Total loss: {k_cast_to_floatx(loss)},
        fashion loss: {k_cast_to_floatx(style_score)},
        content material loss: {k_cast_to_floatx(content_score)},
        whole variation loss: {k_cast_to_floatx(variation_score)},
        time for 1 iteration: {(Sys.time() - start_time) %>% spherical(2)}"
      ) %>% print()
      
      if (i %% 100 == 0) {
        png(paste0("style_epoch_", i, ".png"))
        plot_image <- best_image$numpy()
        plot_image <- deprocess_image(plot_image)
        plot(as.raster(plot_image), essential = glue("Iteration {i}"))
        dev.off()
      }
    }
  }
  
  glue("Total time: {Sys.time() - global_start} seconds") %>% print()
  listing(best_image, best_loss)
}

Ready to run

Now, we’re prepared to begin the method:

c(best_image, best_loss) %<-% run_style_transfer(content_path, style_path)

In our case, outcomes didn’t change a lot after ~ iteration 1000, and that is how our river panorama was wanting:

… positively extra inviting than had it been painted by Edvard Munch!

Conclusion

With neural fashion switch, some fiddling round could also be wanted till you get the outcome you need. But as our instance exhibits, this doesn’t imply the code needs to be sophisticated. Additionally to being straightforward to know, keen execution additionally helps you to add debugging output, and step via the code line-by-line to test on tensor shapes.
Until subsequent time in our keen execution collection!

Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 2015. “A Neural Algorithm of Artistic Style.” CoRR abs/1508.06576. http://arxiv.org/abs/1508.06576.

LEAVE A REPLY

Please enter your comment!
Please enter your name here