Inverse Cooking: Recipe Generation From Food Images
Abstract
People enjoy food photography because they
appreciate food. Behind each meal there is a story
described in a complex recipe and, unfortunately,
by simply looking at a food image we do not have
access to its preparation process. Therefore, in this
paper we introduce an inverse cooking system that
recreates cooking recipes given food images. Our
system predicts ingredients as sets by means of a
novel architecture, modeling their dependencies
without imposing any order, and then generates
cooking instructions by attending to both image
and its inferred ingredients simultaneously. We
extensively evaluate the whole system on the largescale
Recipe1M dataset and show that (1) we
improve performance w.r.t. previous baselines for
ingredient prediction; (2) we are able to obtain high
quality recipes by leveraging both image and
ingredients; (3) our system is able to produce more
compelling recipes than retrieval-based approaches
according to human judgment. We make code and
models publicly available.
Downloads
References
Lukas Bossard, Matthieu Guillaumin, and Luc
Van Gool. Food-101–mining discriminative
components with random forests. In ECCV, 2014.
[2] Micael Carvalho, Remi Cad ´ ene, David
Picard, Laure Soulier, ` Nicolas Thome, and
Matthieu Cord. Cross-modal retrieval in the
cooking context: Learning semantic text-image
embeddings. In SIGIR, 2018.
[3] Jing-Jing Chen and Chong-Wah Ngo. Deepbased
ingredient recognition for cooking recipe
retrieval. In ACM Multimedia. ACM, 2016.Jing-Jing Chen, Chong-Wah Ngo, and Tat-
Seng Chua. Cross-modal recipe retrieval with rich
food attributes. In ACM Multimedia. ACM, 2017.
[5] Mei-Yun Chen, Yung-Hsiang Yang, Chia-Ju
Ho, Shih-Han Wang, Shane-Ming Liu, Eugene
Chang, Che-Hua Yeh, and Ming Ouhyoung.
Automatic chinese food identification and quantity
estimation. In SIGGRAPH Asia 2012 Technical
Briefs, 2012.
[6] Xin Chen, Hua Zhou, and Liang Diao.
Chinesefoodnet: A large-scale image dataset for
chinese food recognition. CoRR, abs/1705.02743,
2017.
[7] Bo Dai, Dahua Lin, Raquel Urtasun, and Sanja
Fidler. Towards diverse and natural image
descriptions via a conditional gan. ICCV, 2017.
[8] Krzysztof Dembczynski, Weiwei Cheng, and
Eyke ´ Hullermeier. Bayes optimal multilabel
classification via ¨ probabilistic classifier chains. In
ICML, 2010.
[9] Angela Fan, Mike Lewis, and Yann Dauphin.
Hierarchical neural story generation. In ACL, 2018.
[10] Claude Fischler. Food, self and identity.
Information (International Social Science Council),
1988.
[11] Jonas Gehring, Michael Auli, David Grangier,
Denis Yarats, and Yann N. Dauphin. Convolutional
sequence to sequence learning. CoRR,
abs/1705.03122, 2017.
[12] Yunchao Gong, Yangqing Jia, Thomas Leung,
Alexander Toshev, and Sergey Ioffe. Deep
convolutional ranking for multilabel image
annotation. CoRR, abs/1312.4894, 2013.
[13] Kristian J. Hammond. CHEF: A model of
case-based planning. In AAAI, 1986.
[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren,
and Jian Sun. Delving deep into rectifiers:
Surpassing human-level performance on imagenet
classification. In CVPR, 2015.
[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren,
and Jian Sun. Deep residual learning for image
recognition. In CVPR, 2016.
[16] Luis Herranz, Shuqiang Jiang, and Ruihan Xu.
Modeling restaurant context for food recognition.
IEEE Transactions on Multimedia, 2017.
[17] Shota Horiguchi, Sosuke Amano, Makoto
Ogawa, and Kiyoharu Aizawa. Personalized
classifier for food image recognition. IEEE
Transactions on Multimedia, 2018.
[18] Qiuyuan Huang, Zhe Gan, Asli C¸ elikyilmaz,
Dapeng Oliver Wu, Jianfeng Wang, and Xiaodong
He. Hierarchically structured reinforcement
learning for topically coherent visual story
generation. CoRR, abs/1805.08191, 2018.