How our recipe extraction pipeline works

Every time you import a recipe, we run it through four extraction stages, each faster and more reliable than the last. 1. **Page-by-page**: we use font-based title detection to find recipes without calling any model. 2. **Cluster-based**: if that fails, we look for dense ingredient groupings anywhere in the document. 3. **Font-based**: a simpler heuristic fallback. 4. **Vision fallback**: as a last resort, we render each page as an image and let a vision model read it directly. The whole pipeline runs in parallel, caches aggressively, and falls back gracefully. Next time you import a popular cookbook, chances are we've seen it before and return results in a few hundred milliseconds.