The crux of the paper we’re trying to reproduce is that the style
transfer problem can be posed as an optimisation
where the loss function we want to minimise can be decomposed into
three distinct parts: the content loss, the style loss and the
total variation loss.
The relative importance of these terms are determined by a set of
scalar weights. These are arbitrary, but the following set have been
chosen after quite a bit of experimentation to find a set that
generates output that’s aesthetically pleasing to me.
content_weight = 0.025
style_weight = 5.0
total_variation_weight = 1.0
We’ll now use the feature spaces provided by specific layers of our
model to define these three loss functions. We begin by initialising
the total loss to 0 and adding to it in stages.
loss = backend.variable(0.)
The content loss
For the content loss, we follow Johnson et al. (2016) and draw the
content feature from
block2_conv2, because the original choice in
Gatys et al. (2015) (
block4_conv2) loses too much structural
detail. And at least for faces, I find it more aesthetically pleasing
to closely retain the structure of the original content image.
This variation across layers is shown for a couple of examples in the
images below (just mentally replace
reluX_Y with our Keras notation