Style Transfer by Matching Paintings to Photographs


Martin Kolář

VUT Brno


transfer of style by finding ideally matching painting

three separable ideas:

        given a single photo, find best painting to transfer style from

        given a painting-photo pair, create semantic matches to guide texture transfer

        given many paintings, create a style map


  1. download LOTS of paintings (power of data!)

35'000 - https://images.nga.gov/en/page/show_home_page.html

400'000 - photographs of objects + paintings http://www.metmuseum.org/art/collection

1'300'000 - 870 paintings, 2'300 drawings, 160'000 photographs http://search.getty.edu/gateway/landing

87'000 - https://www.rijksmuseum.nl/en/rijksstudio

71'000 - https://www.google.com/culturalinstitute/beta/entity/m05y4t?categoryId=medium

Google Art has 21'586 Oil paintings, which are of high quality and high variety

Difficult to download - try this instead: https://commons.wikimedia.org/wiki/Category:Google_Art_Project_works_by_collection?uselang=en-gb

Downloaded 12'940 artworks in mid-resolution to /media/MartinK3TB/Datasets/paintings - use this script https://gist.github.com/mrmartin/91fe4da82578c753a28a0ff533f2e9b9

  1. create a good metric

identify when style transfer works best/when does it fail

        shape match (shapes of objects, but not their location)

        semantic match (types of objects: houses, people, animals, ...)

        color? Not very important

These attributes correspond to specific layers of a CNN. Calculate activations in a representative net, find nearest neighbours in relevant layers

scale of objects - allowing anisotropic transfer (change of scale) results in change of style

        (use to guide the scale factor)


  1. CNN output of first fully connected layer as a metric

VGG, AlexNet, SqueezeNet, NIN?

        using caffenet fc7, manhattan distance

        first 23 images are those I wish to match to

            find /media/MartinK3TB/Documents/neural-style/examples/inputs/photos/ -type f > extracted_images.txt

            find /media/MartinK3TB/Datasets/paintings/ -type f >> extracted_images.txt

        /usr/bin/python get_features.py

        /usr/bin/ipython display_nearest.py

        with display_nearest.py, I have identified the 11 most appropriate images

  1. CNN output of first fully connected layer as a metric

do this for all /media/MartinK3TB/Documents/neural-style/examples/inputs/photos/

  1. CNN output of first fully connected layer as a metric

It works! - results: http://www.fit.vutbr.cz/~kolarmartin/mine/full_table.html

but not on faces:

faces in the style are placed on non-face elements in the photo

elements of faces in the photo are distorted and displaced

limbs are also moved around

sky contains non-sky elements

everything else seems alright

elements of style would benefit from rotation

how does semantic matching handle scale in the literature?

it seems that there are paintings whose style can be applied to numerous images with the existing method

Matching what needs to be transferred where would definitely help, like it does here: Example-Based Synthesis of Stylized Facial Animations


  1. create semantic match for texture transfer - optional

Dan Sykora - je to jeho napad!

Michal Hradis - nevi jak na to    

  1. Other ideas:

user filtering by era (range), colour(s), artist, gallery

other available info: (https://commons.wikimedia.org/wiki/File:Kawamura_Manshu_-_Evening_Glow_-_Google_Art_Project.jpg?uselang=en-gb)


Kawamura Manshu (1880 - 1942) – Painter (Japanese)

Born in Kyoto. Dead in Kyoto.

Details of artist on Google Art Project


*日本語:* 夕映

English: *Evening Glow*

Object type

Japanese Painting




color on silk


Height: 1,680 mm (66.14 in). Width: 865 mm (34.06 in).

Current location


Accession number



9gGYwi48HEPfYA at Google Cultural Institute maximum zoom level

create a video to get accepted at Siggraph. Keyframes from a varying video are taken out, transformed into paintings of different styles, and put into a frame

query expansion:

        given a style image, find others that can also be used as examples of the same style

        subproblem: put all style source images into a style hyperspace

        we can synthesize arbitrary in-between styles by combining weighted nearby examples (Style Interpolation already done by neural-style)

key concept: what is style? What is the desired result?

        The desired result is a painting that captures the scene on the photograph in the same way that the artist has expressed the scene in the painting.

        Naturally, this is hard to accomplish on entirely different scenes. We propose to find paintings for which this can be done.

interactive editing by the user, online learning of user-desired correspondences

use saliency detection to maintain what's important relatively unchanged, or focus on it


  1. I executed neural-style on the starry night/stanford pair to find optimal parameters

output size affects the optimal parameters. Therefore, parameters cannot be tuned at a smaller scale

using all layers produces worse results than manually selecting layers to tune by

cudnn reduces GPU memory footprint by a factor of two

tv regularization of 0 causes color-space artefacts, unlikely colors in neighbouring pixels

the NIN network produces significantly worse results to selected layers of VGG-19

adam optimisation instead of lbfgs causes unlikely short paintbrush marks, and degeneracies in optimisation (convergence toward something wrong)

lbfgs maintains long paintbrush use dependencies, which lead to a more realistic image

all methods produce smoothing (with tv-regularization), or colour-space noise (without tv-regularization)

small areas with high variation (in the source photo) produce color variation that does not correspond to the style source

continuing beyond 1000 itterations slowly improves detail

changing the seed causes important global variations, but local textures depend more on method parameters

minor grid-like artefacts are caused by the underlying canvas and JPEG compression errors on the supplied example in github. The github full res example does not contain these in a way that's noticeable at screen-width

smudging is inevitable

normalizing gradients helps local and global style transfer, but requires re-tuning weight parameters

expert users manually tuning still have trouble to produce satisfactory results (https://github.com/jcjohnson/neural-style/issues/308)