Style Transfer by Matching Paintings to Photographs
12.06.2017
─
Martin Kolář
VUT Brno
RAW NOTES
three separable ideas:
given a single photo, find best painting to transfer style from
given a painting-photo pair, create semantic matches to guide texture transfer
given many paintings, create a style map
35'000 - https://images.nga.gov/en/page/show_home_page.html
400'000 - photographs of objects + paintings http://www.metmuseum.org/art/collection
1'300'000 - 870 paintings, 2'300 drawings, 160'000 photographs http://search.getty.edu/gateway/landing
87'000 - https://www.rijksmuseum.nl/en/rijksstudio
71'000 - https://www.google.com/culturalinstitute/beta/entity/m05y4t?categoryId=medium
Google Art has 21'586 Oil paintings, which are of high quality and high variety
Difficult to download - try this instead: https://commons.wikimedia.org/wiki/Category:Google_Art_Project_works_by_collection?uselang=en-gb
Downloaded 12'940 artworks in mid-resolution to /media/MartinK3TB/Datasets/paintings - use this script https://gist.github.com/mrmartin/91fe4da82578c753a28a0ff533f2e9b9
identify when style transfer works best/when does it fail
shape match (shapes of objects, but not their location)
semantic match (types of objects: houses, people, animals, ...)
color? Not very important
These attributes correspond to specific layers of a CNN. Calculate activations in a representative net, find nearest neighbours in relevant layers
scale of objects - allowing anisotropic transfer (change of scale) results in change of style
(use to guide the scale factor)
other?
VGG, AlexNet, SqueezeNet, NIN?
using caffenet fc7, manhattan distance
first 23 images are those I wish to match to
find /media/MartinK3TB/Documents/neural-style/examples/inputs/photos/ -type f > extracted_images.txt
find /media/MartinK3TB/Datasets/paintings/ -type f >> extracted_images.txt
/usr/bin/python get_features.py
/usr/bin/ipython display_nearest.py
with display_nearest.py, I have identified the 11 most appropriate images
do this for all /media/MartinK3TB/Documents/neural-style/examples/inputs/photos/
It works! - results: http://www.fit.vutbr.cz/~kolarmartin/mine/full_table.html
but not on faces:
faces in the style are placed on non-face elements in the photo
elements of faces in the photo are distorted and displaced
limbs are also moved around
sky contains non-sky elements
everything else seems alright
elements of style would benefit from rotation
how does semantic matching handle scale in the literature?
it seems that there are paintings whose style can be applied to numerous images with the existing method
Matching what needs to be transferred where would definitely help, like it does here: Example-Based Synthesis of Stylized Facial Animations
Dan Sykora - je to jeho napad!
Michal Hradis - nevi jak na to
user filtering by era (range), colour(s), artist, gallery
other available info: (https://commons.wikimedia.org/wiki/File:Kawamura_Manshu_-_Evening_Glow_-_Google_Art_Project.jpg?uselang=en-gb)
Artist | Kawamura Manshu (1880 - 1942) – Painter (Japanese) Born in Kyoto. Dead in Kyoto. Details of artist on Google Art Project |
Title | *日本語:* 夕映 English: *Evening Glow* |
Object type | Japanese Painting |
Date | 1910 |
Medium | color on silk |
Dimensions | Height: 1,680 mm (66.14 in). Width: 865 mm (34.06 in). |
Current location | [TABLE] |
Accession number | 22 |
Source/Photographer | 9gGYwi48HEPfYA at Google Cultural Institute maximum zoom level |
create a video to get accepted at Siggraph. Keyframes from a varying video are taken out, transformed into paintings of different styles, and put into a frame
query expansion:
given a style image, find others that can also be used as examples of the same style
subproblem: put all style source images into a style hyperspace
we can synthesize arbitrary in-between styles by combining weighted nearby examples (Style Interpolation already done by neural-style)
key concept: what is style? What is the desired result?
The desired result is a painting that captures the scene on the photograph in the same way that the artist has expressed the scene in the painting.
Naturally, this is hard to accomplish on entirely different scenes. We propose to find paintings for which this can be done.
interactive editing by the user, online learning of user-desired correspondences
use saliency detection to maintain what's important relatively unchanged, or focus on it
https://github.com/Robert0812/deepsaldet
output size affects the optimal parameters. Therefore, parameters cannot be tuned at a smaller scale
using all layers produces worse results than manually selecting layers to tune by
cudnn reduces GPU memory footprint by a factor of two
tv regularization of 0 causes color-space artefacts, unlikely colors in neighbouring pixels
the NIN network produces significantly worse results to selected layers of VGG-19
adam optimisation instead of lbfgs causes unlikely short paintbrush marks, and degeneracies in optimisation (convergence toward something wrong)
lbfgs maintains long paintbrush use dependencies, which lead to a more realistic image
all methods produce smoothing (with tv-regularization), or colour-space noise (without tv-regularization)
small areas with high variation (in the source photo) produce color variation that does not correspond to the style source
continuing beyond 1000 itterations slowly improves detail
changing the seed causes important global variations, but local textures depend more on method parameters
minor grid-like artefacts are caused by the underlying canvas and JPEG compression errors on the supplied example in github. The github full res example does not contain these in a way that's noticeable at screen-width
smudging is inevitable
normalizing gradients helps local and global style transfer, but requires re-tuning weight parameters
expert users manually tuning still have trouble to produce satisfactory results (https://github.com/jcjohnson/neural-style/issues/308)