Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on .find_tables() #3191

Closed
JorjMcKie opened this issue Feb 20, 2024 Discussed in #3190 · 1 comment
Closed

Error on .find_tables() #3191

JorjMcKie opened this issue Feb 20, 2024 Discussed in #3190 · 1 comment
Labels
bug fix developed release schedule to be determined Fixed in next release

Comments

@JorjMcKie
Copy link
Collaborator

Discussed in #3190

Originally posted by bjmvercelli February 20, 2024
Hello, hope you guys are doing great.

I'm getting an error in version 1.23.24 (latest) using find_tables() method, more specific on extract_text() call.

The following code was extracted from table.py (lines 606 and 607). The error happens when extract_words(chars) returns an empty array.

words = extractor.extract_words(chars)
rotation = words[0]["rotation"]  # rotation cannot change within a cell

I do not believe that there's a problem in extract_words(), but i do believe that's an edge case from my PDF and, if thats the case, we could fix it by validating the length of words:

words = extractor.extract_words(chars)
if len(words) == 0:
  return ""
rotation = words[0]["rotation"]  # rotation cannot change within a cell

You can reproduce here

@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.23.25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix developed release schedule to be determined Fixed in next release
Projects
None yet
Development

No branches or pull requests

2 participants