This account has been removed.
This account has been removed.

Leveraging Wikipedia and a Large Language Model to Develop a Knowledge Resource Containing World Foods and Allergens


Abstract

Introduction

Researchers and clinicians often turn to electronic health record (EHR) data for details on dietary intake and concerns for allergic reactions. In this work, we created a knowledge resource to match foods from around the world to their common ingredients and potential allergens.

Methods

We used Wikipedia (2024-12-01 English download) to develop a list of foods from around the world. Wikipedia food pages contain a metadata field main_ingredient that was used to identify foods. Using the food list obtained from Wikipedia, we created a catalog of ingredients for each food using prompts sent to the large language model (LLM) OpenAI ChatGPT 3.5 API (CGPT). For each food name and associated ingredients, we prompted CGPT for a list of allergens in each food based on the eight major food allergens described by the Food Allergen Labeling and Consumer Protection Act: milk, eggs, fish, shellfish, tree nuts, peanuts, wheat, and soy. Prompts and knowledge resources are available at: https://github.com/dhanauer/food-allergens.

Results

Using Wikipedia, we identified 6,693 foods with a main_ingredient field. From this list, we found 7,030 distinct ingredients. Using CGPT for the same foods, a list of 4,083 ingredients was obtained, suggesting more consistency and less variability. An UpSet plot of allergens in the foods is shown in the Figure (see poster). We noted issues with both the Wikipedia output and CGPT output: for example, both Wikipedia and CGPT did not always list ingredients in a consistent manner, and some were quite free-form.

Discussion

We found Wikipedia to be most useful for creating an overall list of foods, but the LLM was better at providing a consistent list of ingredients. A combined approach proved to be most accurate.

Poster
non-peer-reviewed

Leveraging Wikipedia and a Large Language Model to Develop a Knowledge Resource Containing World Foods and Allergens


Author Information

Simon Shavit

Literature, Science, and the Arts, University of Michigan, Ann Arbor, USA

David A. Hanauer Corresponding Author

Learning Health Sciences, University of Michigan, Ann Arbor, USA


PDF Share