High-quality Image
Image Denoising
许可协议: Unknown


To tackle the general photo enhancement problem by mapping low-quality phone photos into photos captured by a professional DSLR camera, we introduce a large-scale DPED dataset that consists of photos taken synchronously in the wild by three smartphones and one DSLR camera. The devices used to collect the data are iPhone 3GS, BlackBerry Passport, Sony Xperia Z and Canon 70D DSLR. To ensure that all devices were capturing photos simultaneously, they were mounted on a tripod and activated remotely by a wireless control system.

In total, over 22K photos were collected during 3 weeks, including 4549 photos from Sony smartphone, 5727 from iPhone and 6015 photos from BlackBerry; for each smartphone photo there is a corresponding photo from the Canon DSLR. The photos were taken during the daytime in a wide variety of places and in various illumination and weather conditions. The images were captured in automatic mode, we used default settings for all cameras throughout the whole collection procedure.

The synchronously captured photos are not perfectly aligned since the cameras have different viewing angles, focal lengths and positions. To address this, we performed additional non-linear transformations based on SIFT features to extract the intersection part between phone and DSLR photos, and then used the obtained aligned image fragments to extract patches of size 100x100 pixels for CNN training (139K, 160K and 162K pairs for BlackBerry, iPhone and Sony, respectively). These patches constituted the input data to our CNN.


Image enhancement is performed using a 12-layer Residual Convolutional Neural Network that takes phone photo as an input and is trained to reproduce the corresponding image from DSLR camera. In other words, its goal is to learn the underlying translation function that modifies photos taken by a given camera into DSLR-quality photos. The network is trained to minimize a composite loss function that consists of the following three terms:

  • Color loss: the enhanced image should be close to the target (DSLR) photo in terms of colors. To measure the difference between them, we apply Gaussian blur to both images and compute the Euclidean distance between the obtained representations.
  • Texture loss: to measure texture quality of the enhanced image, we train a separate adversarial CNN-discriminator that observes both improved and target grayscale images, and its objective is to predict which image is which. The goal of our image enhancement network is to fool the discriminator, so that it cannot distinguish between them.
  • Content loss: a distinct VGG-19 CNN pre-trained on Alexnet is used to preserve image semantics: content description produced by this CNN should be the same both for the improved and target images.

All these losses are then summed, and the system is trained as a whole with the backpropagation algorithm to minimize the final weighted loss.


  • TensorFlow implementation of the proposed models and the whole training pipeline is available in our github repo

  • Pre-trained models + standalone code to run them can be downloaded separately here

  • Prerequisites: GPU + CUDA CuDNN + TensorFlow (>= 1.0.1)


author={Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey and Luc Van Gool}.
title={"DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks"},
booktitle={IEEE International Conference on Computer Vision (ICCV)},
ETH Zurich
ETH Zurich is a public research university in the city of Zürich, Switzerland