Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use recipe_scrapers metadata for more accurate parsing #5165

Merged

Conversation

eric-hoffmann
Copy link
Contributor

What this PR does / why we need it:

Some recipe websites have additional nutrition information not contained in the schema data. The recipe-scrapers library has a nutrients() function which implements custom parsing for some sites, but Mealie is currently pulling nutrition information directly from the schema. This PR updates Mealie’s recipe-scrapers nutrition parsing to use the nutrients function. Existing parsing logic will gracefully fall back to direct parse from schema if the nutrients() function fails for whatever reason.

Which issue(s) this PR fixes:

Fixes #5164

Testing

Tested parsing for recipes exhibiting this issue with a local Mealie instance:

Example recipe: https://cookwell.com/recipe/crispy-oven-fries

Info captured by recipe-scrapers library:

from urllib.request import urlopen
from recipe_scrapers import scrape_html
url = "https://cookwell.com/recipe/crispy-oven-fries"
html = urlopen(url).read().decode("utf-8")
scraper = scrape_html(html,url)

#Data directly from schema:
scraper.schema.data['nutrition']
{'@type': 'NutritionInformation', 'calories': '2558 calories'}
#Data captured by nutrients function:
scraper.nutrients()
{'calories': '2558', 'carbohydrateContent': '402 g', 'fatContent': '49 g', 'proteinContent': '132 g'}


Nutrition ingredients captured from schema (Missing all information except calories):

image

With update, nutrition info captured from scraper.nutrients():

image

Verified

This commit was signed with the committer’s verified signature.
hhatto Hideo Hattori
@github-actions github-actions bot added the bugfix label Mar 3, 2025
@eric-hoffmann eric-hoffmann marked this pull request as ready for review March 3, 2025 03:55
@michael-genson
Copy link
Collaborator

Thanks for this! Would you be able to do the same for other fields that recipe scrapers includes? I know image is one of them (scraped_data.image()) but there are probably more.

@eric-hoffmann
Copy link
Contributor Author

Sure, updated the other fields that had a direct equivalent parser function. scraped_data.url is always populated by the library verbatim from what Mealie passes it, so I left that one alone.

Copy link
Collaborator

@michael-genson michael-genson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@michael-genson michael-genson enabled auto-merge (squash) March 3, 2025 13:51
@michael-genson michael-genson changed the title fix: Use recipe-parsers nutrients function for nutrition parsing fix: Use recipe_scrapers metadata for more accurate parsing Mar 3, 2025
@michael-genson michael-genson changed the title fix: Use recipe_scrapers metadata for more accurate parsing feat: Use recipe_scrapers metadata for more accurate parsing Mar 3, 2025
@michael-genson michael-genson merged commit a758406 into mealie-recipes:mealie-next Mar 3, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SCRAPER] - recipe-scrapers library's nutrients function is not used when parsing nutrition information
2 participants