Image Colorization Using Pix2Pix with cWGAN and Residual U-Net Architecture
Keywords:
Pix2Pix, cWGAN, Residual U-Net ArchitectureAbstract
This paper presents an advanced image colorization approach leveraging a hybrid architecture combining Pix2Pix and conditional Wasserstein Generative Adversarial Network (cWGANs). Traditional image colorization is a challenging, ill-posed problem due to the need to infer plausible color values from grayscale inputs. To address this, we propose a supervised deep learning pipeline that conditions the generator on grayscale images while employing the Wasserstein loss with gradient penalty to stabilize GAN training and enhance color realism. The generator adopts a U-Net structure to preserve spatial fidelity, and the discriminator evaluates image realism in a conditional setup. The system is trained on grayscale-color image pairs transformed into the Lab color space, where the model learns to predict the 'ab' channels from the 'L' channel. The use of PyTorch Lightning ensures modular and scalable experimentation. Preliminary results demonstrate superior performance in producing sharp, semantically consistent colorizations compared to baseline GAN models, with quantitative assessments using Inception Score and perceptual losses. This research contributes to both aesthetic and practical applications, such as restoring historical photographs and aiding visual understanding in scientific imaging.
Downloads
References
1. Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134.
2. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 214–223.
3. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. In Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017; pp. 5767–5777.
4. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595.
5. Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Let There Be Color!: Joint End-to-end Learning of Global and Local Image Priors fo r Automatic Image Colorization with Simultaneous Classification. ACM Trans. Graph. 2016, 35, 110:1–110:11.
6. Zhang, R.; Zhu, J.-Y.; Isola, P.; Geng, X.; Lin, A.S.; Yu, T.; Efros, A.A. Real‑time User‑Guided Image Colorization with Learned Deep Priors. ACM Trans. Graph. 2017, 36, 119:1–119:11.
7. Denton, E.L.; Chintala, S.; Szlam, A.; Fergus, R. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1486–1494.
8. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv 2016, arXiv:1607.08022.
9. Nazeri, K.; Ng, E.; Joseph, T.; Qureshi, F.; Ebrahimi, M. Automatic Image Colorization using Generative Adversarial Networks. arXiv 2018, arXiv:1803.05481.
10. Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss Functions for Image Restoration with Neural Networks. IEEE Trans. Comput. Imaging 2017, 3, 47–57.