Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings
Epicure is a research project that explores how computers can represent the complex relationships between food ingredients. By analyzing millions of recipes and chemical data, the researchers created a system that maps ingredients into a 300-dimensional space. This allows the model to understand culinary concepts—such as flavor profiles, nutritional content, and cultural traditions—as mathematical directions, enabling tools that can help chefs discover new ingredient pairings or explore culinary categories.
A Multilingual Foundation
To build a comprehensive model, the researchers aggregated over 4.1 million recipes from 11 different sources across seven languages, including English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, and German. Because raw recipe data is often messy, they used an AI-augmented pipeline to normalize roughly 200,000 unique ingredient terms into a clean, canonical list of 1,790 ingredients. This ensures the model can recognize that different terms across languages often refer to the same culinary building block.
Three Perspectives on Flavor
The core innovation of Epicure is the creation of three "sibling" models that learn from different types of information. All three share the same architecture but differ in how they "walk" through the data:
Cooc: Focuses exclusively on recipe co-occurrence, learning which ingredients are typically used together in the same dish.
Chem: Focuses on chemical compounds, learning how ingredients are related based on the flavor molecules they share.
Core: A hybrid model that blends both recipe context and chemical data.
By adjusting these "walk schemas," the researchers created a controllable spectrum that allows them to see how much weight should be given to human culinary habits versus the underlying chemistry of food.
Navigating Culinary Geometry
The researchers found that these models naturally organize ingredients into meaningful clusters without being explicitly told to do so. For example, the embeddings clearly separate ingredients by cuisine (such as East Asian or Mediterranean) and nutritional category.
The models support two primary ways of interacting with this "flavor space":
Nearest-Neighbor Lookups: Finding ingredients that are mathematically similar to a starting point.
Directional Arithmetic: Using a technique called SLERP, users can "rotate" an ingredient toward a specific goal. For instance, a user could take a "rice" seed and rotate it toward a "South-Asian" pole to discover related ingredients like curry leaves, urad dal, or fenugreek seeds.
Important Considerations
While the Epicure models demonstrate that culinary knowledge can be effectively captured in a dense mathematical space, the researchers have not released the code or the trained models at this time. The study highlights that while the models are highly effective at recovering supervised labels like macronutrients or sensory categories, the "geometry" of the food space changes depending on whether the model prioritizes chemical data or recipe-based context.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!