Jay Taylor's notes

back to listing index

Convolutional neural networks for artistic style transfer — Harish Narayanan

[web search]
Original source (harishnarayanan.org)
Tags: image-processing cnn convolutional-neural-networks artistic-style-transfer harishnarayanan.org
Clipped on: 2017-04-22

Convolutional neural networks for artistic style transfer

31 Mar 2017 — 52 min read

There’s an amazing app out right now called Prisma that transforms your photos into works of art using the styles of famous artwork and motifs. The app performs this style transfer with the help of a branch of machine learning called convolutional neural networks. In this article we’re going to take a journey through the world of convolutional neural networks from theory to practice, as we systematically reproduce Prisma’s core visual effect.

So what is Prisma and how might it work?

Prisma is a mobile app that allows you to transfer the style of one image, say a cubist painting, onto the content of another, say a photo of your toddler, to arrive at gorgeous results like these:

Image (Asset 1/51) alt=

The crux of the paper we’re trying to reproduce is that the style transfer problem can be posed as an optimisation problem, where the loss function we want to minimise can be decomposed into three distinct parts: the content loss, the style loss and the total variation loss.

The relative importance of these terms are determined by a set of scalar weights. These are arbitrary, but the following set have been chosen after quite a bit of experimentation to find a set that generates output that’s aesthetically pleasing to me.

content_weight = 0.025
style_weight = 5.0
total_variation_weight = 1.0

We’ll now use the feature spaces provided by specific layers of our model to define these three loss functions. We begin by initialising the total loss to 0 and adding to it in stages.

loss = backend.variable(0.)
The content loss

For the content loss, we follow Johnson et al. (2016) and draw the content feature from block2_conv2, because the original choice in Gatys et al. (2015) (block4_conv2) loses too much structural detail. And at least for faces, I find it more aesthetically pleasing to closely retain the structure of the original content image.

This variation across layers is shown for a couple of examples in the images below (just mentally replace reluX_Y with our Keras notation blockX_convY).

Image (Asset 2/51) alt=