Overview
To tackle the general photo enhancement problem by mapping low-quality
phone photos into photos captured by a professional DSLR camera, we introduce a large-scale
DPED dataset that consists of photos taken synchronously in the wild by three smartphones and
one DSLR camera. The devices used to collect the data are iPhone 3GS,
BlackBerry Passport, Sony Xperia Z
and Canon 70D DSLR.
To ensure that all devices
were capturing photos simultaneously, they were mounted on a tripod and activated remotely
by a wireless control system.
In total, over 22K photos were collected during 3 weeks, including 4549 photos from Sony smartphone, 5727 from iPhone and 6015 photos from BlackBerry; for each smartphone photo there is a corresponding photo from the Canon DSLR. The photos were taken during the daytime in a wide variety of places and in various illumination and weather conditions. The images were captured in automatic mode, we used default settings for all cameras throughout the whole collection procedure.
The synchronously captured photos are not perfectly aligned since the cameras
have different viewing angles, focal lengths and positions. To address this, we performed additional
non-linear transformations based on SIFT features to extract the intersection part between
phone and DSLR photos, and then used the obtained aligned image fragments to extract patches
of size 100x100 pixels for CNN training (139K, 160K and 162K pairs for BlackBerry, iPhone and
Sony, respectively). These patches constituted the input data to our CNN.
Algorithm
Image enhancement is performed using a 12-layer Residual Convolutional
Neural Network that takes phone photo as an input and is trained to reproduce the corresponding
image from DSLR camera. In other words, its goal is to learn the underlying translation function
that modifies photos taken by a given camera into DSLR-quality photos. The network is trained
to minimize a composite loss function that consists of the following three terms:
- Color loss: the enhanced image should be close to the target (DSLR) photo in terms of colors. To measure the difference between them, we apply Gaussian blur to both images and compute the Euclidean distance between the obtained representations.
- Texture loss: to measure texture quality of the enhanced image, we train a separate adversarial CNN-discriminator that observes both improved and target grayscale images, and its objective is to predict which image is which. The goal of our image enhancement network is to fool the discriminator, so that it cannot distinguish between them.
- Content loss: a distinct VGG-19 CNN pre-trained on Alexnet is used to preserve image semantics: content description produced by this CNN should be the same both for the improved and target images.
All these losses are then summed, and the system is trained as a whole with the backpropagation algorithm to minimize the final weighted loss.
Code
-
TensorFlow implementation of the proposed models and the whole training pipeline is available in our github repo
-
Pre-trained models + standalone code to run them can be downloaded separately here
-
Prerequisites: GPU + CUDA CuDNN + TensorFlow (>= 1.0.1)
Citation
@inproceedings{
author={Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey and Luc Van Gool}.
title={"DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks"},
booktitle={IEEE International Conference on Computer Vision (ICCV)},
year={2017}
}