Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: Let _build_destination skip in case of missing /D key #2018

Merged
merged 3 commits into from Dec 9, 2023

Conversation

nickryand
Copy link
Contributor

Calling _build_destination with a dictionary can cause it to error if the code which slices the array is executed. This skips the scenario where DictionaryObject does not have a "/D" key and is currently passed to _build_destination.

* Adds an else block to prevent calling _build_destination is not called
  with a dictionary object when it is expecting a list.
@MartinThoma
Copy link
Member

Do you have a PDF / some code that shows the issue you want to fix?

@nickryand
Copy link
Contributor Author

This is running against:

poetry show | grep pypdf
pypdf             3.14.0 A pure-python PDF library capable of splitting, me...
import sys

from pypdf import PdfReader, PdfWriter


def main():
    file = PdfReader(sys.argv[1])

    new_pdf = PdfWriter()
    new_pdf.append(fileobj=file, pages=(0, 20))

    with open("test_copy.pdf", "wb") as fp:
        new_pdf.write(fp)


if __name__ == "__main__":
    main()
$ python test.py test.pdf
Traceback (most recent call last):
  File "/tmp/pypdf/test.py", line 17, in <module>
    main()
  File "/tmp/pypdf/test.py", line 10, in main
    new_pdf.append(fileobj=file, pages=(0, 20))
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_writer.py", line 2812, in append
    self.merge(
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_utils.py", line 466, in wrapper
    return func(*args, **kwargs)
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_writer.py", line 2904, in merge
    reader.named_destinations
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 513, in named_destinations
    return self._get_named_destinations()
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 777, in _get_named_destinations
    self._get_named_destinations(kid.get_object(), retval)
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 794, in _get_named_destinations
    dest = self._build_destination(key, value)  # type: ignore
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 1001, in _build_destination
    page, typ = array[0:2]  # type: ignore
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 320, in __getitem__
    return dict.__getitem__(self, key).get_object()
TypeError: unhashable type: 'slice'

@pubpub-zz
Copy link
Collaborator

@nickryand can you provide the test.pdf file you are using ?

@nickryand
Copy link
Contributor Author

I cannot provide it as it's part of a collection I'm working on for someone else. I can provide metadata information about it.

@pubpub-zz
Copy link
Collaborator

do you accept to send it by mail to @MartinThoma info@martin-thoma.de ? it will be easier for analysis

@MartinThoma
Copy link
Member

@nickryand Your PR breaks CI and cannot be merged like this. See
pytest tests/test_reader.py::test_named_destination\[stored_directly\].

You could either remove the else: continue or change the block to:

                if isinstance(val, DictionaryObject):
                    if "/D" in val:
                        val = val["/D"].get_object()
                    else:
                        continue

@MartinThoma MartinThoma changed the title Build destination fix ROB: Build destination fix Aug 13, 2023
@MartinThoma MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label Aug 13, 2023
@pubpub-zz
Copy link
Collaborator

do you accept to send it by mail to @MartinThoma info@martin-thoma.de ? it will be easier for analysis

Is this request still active?

@MartinThoma
Copy link
Member

I haven't seen any e-mail, but even without that I can see that this kind of structure can be problematic. In general, we likely always have to assume that some key does not exist (even if the specs require it).

At some point I hope we can find more general solutions, e.g. defining the specs formally in code and defining the fallback (or exception) we raise if pypdf wants to access a key that does not exist.

@codecov
Copy link

codecov bot commented Aug 13, 2023

Codecov Report

Patch coverage: 42.85% and project coverage change: +0.04% 🎉

Comparison is base (11ee648) 94.07% compared to head (acab21a) 94.12%.
Report is 38 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2018      +/-   ##
==========================================
+ Coverage   94.07%   94.12%   +0.04%     
==========================================
  Files          33       41       +8     
  Lines        7077     7391     +314     
  Branches     1413     1460      +47     
==========================================
+ Hits         6658     6957     +299     
- Misses        262      269       +7     
- Partials      157      165       +8     
Files Changed Coverage Δ
pypdf/_reader.py 91.33% <42.85%> (-0.17%) ⬇️

... and 20 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma MartinThoma changed the title ROB: Build destination fix ROB: Let _build_destination skip in case of missing /D key Dec 9, 2023
@MartinThoma MartinThoma merged commit 5e59160 into py-pdf:main Dec 9, 2023
13 of 14 checks passed
@MartinThoma
Copy link
Member

@nickryand Thank you for your contribution 🙏 It will be part of the pypdf release on 10.12.2023 (Sunday).

@MartinThoma
Copy link
Member

@nickryand If you want I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-)

@MartinThoma
Copy link
Member

Oh, wow, I just noticed it took me way too long to merge this. I'm sorry for that 🙈

MartinThoma added a commit that referenced this pull request Dec 10, 2023
## What's new

### Bug Fixes (BUG)
-  Cope with deflated images with CMYK Black Only (#2322) by @pubpub-zz
-  Handle indirect objects as parameters for CCITTFaxDecode (#2307) by @stefan6419846
-  check words length in _cmap type1_alternative function (#2310) by @Takher

### Robustness (ROB)
-  Relax flate decoding for too many lookup values (#2331) by @stefan6419846
-  Let _build_destination skip in case of missing /D key (#2018) by @nickryand

### Documentation (DOC)
-  Note in reading form data (#2338) by @MartinThoma
-  Pull Request prefixes and size by @MartinThoma
-  Add https://github.com/zuypt for #2325 as a contributor by @MartinThoma
-  Fix docstring for RunLengthDecode.decode (#2302) by @stefan6419846

### Maintenance (MAINT)
-  Enable `disallow_any_generics` and add missing generics (#2278) by @nilehmann

### Testing (TST)
-  Centralize file downloads (#2324) by @MartinThoma

### Code Style (STY)
-  Fix typo "steam" \xe2\x86\x92 "stream" (#2327) by @stefan6419846
-  Run black by @MartinThoma
-  Make Traceback in bug report template uppercase (#2304) by @stefan6419846

[Full Changelog](3.17.1...3.17.2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-robustness-issue From a users perspective, this is about robustness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants