Revolutionizing Culinary Experiences with AI: Introducing FIRE (Food Image to REcipe generation) 🔥 | by Prateek Chhikara

Contents

1. Recipe Customization 2. Generating Machine Code for Image-based Recipes

1. Recipe Customization

Recipe customization is crucial due to the connection between food, customs, and individual preferences. Additionally, it becomes essential when addressing allergies or dietary restrictions. Surprisingly, despite the evident demand, existing literature lacks dedicated efforts in recipe customization. Our work aims to bridge the research gap by enabling personalized recipe customization, considering individual taste profiles and dietary restrictions.

To guide future research in this area, we showcase the ability of FIRE to support a recipe customization approach that focuses on a wide range of topics (e.g., ingredient replacement, taste adjustment, calorie adjustment, cooking time adaptation) to test few-shot performance thoroughly. As shown in the purple part of Figure 5, we remove ingredients to trim the potatoes from the recipe. Two sentences related to potatoes are deleted in the modified version, and one sentence is changed to ensure consistency. Specifically, we perform ingredient addition to replace ‘cheese’ with ‘cheddar cheese’ and recognize that it should be added before baking, resulting in the modified sentence ‘Sprinkle half each of cheddar cheese and onions.’

2. Generating Machine Code for Image-based Recipes

Converting recipes to machine code enables automation, scalability, and integration with various existing systems, thus reducing manual intervention, saving labor costs, and reducing human errors while preparing the food. To facilitate this task, we combine FIRE’s recipe generation strength with the ability of large LMs to manipulate code-style prompts for structural tasks [14]. We show an example approach for generating Python-style code representations of recipes developed by FIRE, by prompting GPT-3 (please refer to orange part in Figure 5).

We introduced FIRE, a methodology tailored for food computing, focusing on generating food titles, extracting ingredients, and generating cooking instructions solely from image inputs. We leveraged recent CV and language modeling advancements to achieve superior performance against solid baselines. Furthermore, we demonstrated practical applications of FIRE for recipe customization and recipe-to-code generation, showcasing the adaptability and automation potential of our approach.

We list three challenges that should be addressed in future research:

Existing and proposed recipe generation models lack a reliable mechanism to verify the accuracy of the generated recipes. Conventional evaluation metrics fall short in this aspect. Hence, we would like to create a new metric that assesses the coherence and plausibility of recipes, providing a more thorough evaluation.
The diversity and availability of recipes are influenced by geographical, climatic, and religious factors, which may limit their applicability. Incorporating knowledge graphs that account for these contextual factors and ingredient relationships can offer alternative ingredient suggestions, addressing this issue.
Hallucination in recipe generation using language and vision models poses a significant challenge. Future work would explore the state-tracking methods to improve the generation process, ensuring the production of more realistic and accurate recipes.

I hope this overview has provided you the insight into the inspiration and development of FIRE, our innovative tool for converting food images into detailed recipes. For a more in-depth exploration of our approach, I invite you to check out our full paper, which is published in the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) — 2024. If our research contribute to your work, we would be happy if you cite it. 😊

Paper Link: https://openaccess.thecvf.com/content/WACV2024/html/Chhikara_FIRE_Food_Image_to_REcipe_Generation_WACV_2024_paper.html

@InProceedingsChhikara_2024_WACV,
author    = Chhikara, Prateek and Chaurasia, Dhiraj and Jiang, Yifan and Masur, Omkar and Ilievski, Filip,
title     = FIRE: Food Image to REcipe Generation,
booktitle = Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV),
month     = January,
year      = 2024,
pages     = 8184-8194

[1] Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. A survey on food computing. ACM Comput. Surv., 52(5), sep 2019.

[2] Sutter Health. Eating Well for Mental Health. https://www.sutterhealth.org/health/nutrition/eating-wellfor-mental-health. Accessed on March 24, 2023.

[3] Kiely Kuligowski. 12 Reasons to Use Instagram for Your Business. https://www.business.com/articles/10-reasons-touse-instagram-for-business/. Accessed on May 12, 2023.

[4] Dim P. Papadopoulos, Enrique Mora, Nadiia Chepurko, Kuan Wei Huang, Ferda Ofli, and Antonio Torralba. Learning program representations for food images and cooking recipes, 2022.

[5] Sundaram Gunasekaran. Computer vision technology for food quality assurance. Trends in Food Science & Technology, 7(8):245–256, 1996.

[6] Yoshiyuki Kawano and Keiji Yanai. Food image recognition with deep convolutional features. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pages 589– 593, 2014

[7] Amaia Salvador, Michal Drozdzal, Xavier Giro-i Nieto, and ´ Adriana Romero. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10453– 10462, 2019.

[8] Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888– 12900. PMLR, 2022.

[9] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

[10] Prateek Chhikara, Ujjwal Pasupulety, John Marshall, Dhiraj Chaurasia, and Shweta Kumari. Privacy aware questionanswering system for online mental health risk assessment. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 215– 222, Toronto, Canada, July 2023. Association for Computational Linguistics.

[11] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.

[12] Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Francisco Guzman, Luke Zettlemoyer, and Marjan ´ Ghazvininejad. Detecting hallucinated content in conditional neural sequence generation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1393–1404, 2021.

[13] Mehrdad Farahani and Kartik Godawat and Haswanth Aekula and Deepak Pandian and Nicholas Broad. Chef Transformer. https://huggingface.co/flax-community/t5- recipe-generation. Accessed on April 12, 2023.

[14] Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, and Graham Neubig. Language models of code are few-shot commonsense learners. In Findings of the Association for Computational Linguistics: EMNLP 2022, 2022