Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image within div or span with no text is annotated as Image #3962

Merged
merged 5 commits into from
Mar 20, 2025

Conversation

ajjimeno
Copy link
Contributor

Ticket: https://unstructured-ai.atlassian.net/browse/ML-942

The following uncompressed HTML document can be used to test the transformation using the partition_html function from the VLM partitioner.

recalibrating-risk-report.pdf.json.html.zip

Verified

This commit was signed with the committer’s verified signature.
avkos Oleksii Kosynskyi

Verified

This commit was signed with the committer’s verified signature.
avkos Oleksii Kosynskyi
@ryannikolaidis
Copy link
Contributor

ryannikolaidis commented Mar 19, 2025

worth a quick test somewhere just to validate + prevent breaking in the future?

Verified

This commit was signed with the committer’s verified signature.
avkos Oleksii Kosynskyi
@@ -437,6 +437,12 @@ def extract_tag_and_ontology_class_from_tag(
html_tag = "span"
element_class = ontology.UncategorizedText

# Scenario 5: Image with no text a ontology.UncategorizedText element_class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wording here is a bit confusing.

Suggested change
# Scenario 5: Image with no text a ontology.UncategorizedText element_class
# Scenario 5: UncategorizedText has image and no text

maybe?

Verified

This commit was signed with the committer’s verified signature.
luu-alex Alex

Verified

This commit was signed with the committer’s verified signature.
luu-alex Alex
@ajjimeno ajjimeno requested a review from ryannikolaidis March 19, 2025 22:06
Copy link
Contributor

@ryannikolaidis ryannikolaidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@ajjimeno ajjimeno added this pull request to the merge queue Mar 20, 2025
Merged via the queue into main with commit 0fa5174 Mar 20, 2025
43 checks passed
@ajjimeno ajjimeno deleted the feat/ML-942-div-tag-images branch March 20, 2025 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants