Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reader resolution algorithm seems inconsistent between stdlib and importlib_resources #295

Closed
mmerickel opened this issue Feb 6, 2024 · 11 comments · Fixed by #297
Closed
Assignees

Comments

@mmerickel
Copy link

mmerickel commented Feb 6, 2024

Hello friends,

I'm looking at how to reimplement resource overrides in Pyramid. It's a unique feature that in the past involved patching a custom package.__loader__ which could intercept and do its own file lookups for various requests. This worked with all pkg_resources apis, like pkg_resources.resource_filename, etc to return a different path than the one physically on disk when a path is overridden.

To implement this, I've been going through the PEPs and trying to understand the protocol that importlib.resources uses to support overriding how resources are loaded from a package. I've narrowed this down to spec.loader.get_resource_reader() being able to be defined. My plan will be to define a ProxyLoader which defines its own get_resource_reader but proxies the rest of the loader functionality back to the original loader. So, I'm looking at how to define a custom abc.TraversableResources class that does what I need, and then hook package.__spec__.loader to return it from get_resource_reader().

From what I can tell, the logic here is inconsistent between importlib_resources and the stdlib impl in CPython. I'm looking here:

https://github.com/python/cpython/blob/5ddb2740404ad62e74ecb0ea97ed98651821ed3f/Lib/importlib/resources/_adapters.py#L28-L29

compared to

return (
# local ZipReader if a zip module
_zip_reader(self.spec)
or
# local NamespaceReader if a namespace module
_namespace_reader(self.spec)
or
# local FileReader
_file_reader(self.spec)
or
# native reader if it supplies 'files'
_native_reader(self.spec)
or
# fallback - adapt the spec ResourceReader to TraversableReader
_adapters.CompatibilityFiles(self.spec)
)

In both cases, TraversableResourceLoader is attempting to determine which reader to return. In the stdlib, preference is given to the native spec.loader.get_resource_reader() if the reader returned has a files() method. However, importlib_resources inverts the lookup such that zip/namespace/file take priority over the "native" reader. So for example, if self.path exists, then whatever I set as spec.loader.get_resource_reader() will never be invoked and hence I can't override the lookups.

Currently, to get this to work with importlib_resources, it appears that I would need the spec.path to be a non-existent path, which would trigger the logic to fallback to the native reader that I patch, but this is by far not ideal.

The TLDR is that I would propose importlib_resources moves the _native_reader(self.spec) invocation to the TOP of the list, which would be more in line with the stdlib.

I would really appreciate any guidance here. Maybe I'm completely off base and missing something critical in the importlib machinery, and happy to talk more about this use-case in Pyramid that has been supported for many years via a PEP320 loader used by pkg_resources. Thank you!

@mmerickel
Copy link
Author

No activity? :-( @jaraco?

@jaraco
Copy link
Member

jaraco commented Feb 24, 2024

Thanks for the ping. I've been overloaded (mainly with life events). I'll plan to take a look soon.

@jaraco
Copy link
Member

jaraco commented Feb 25, 2024

First, thanks for sharing your detailed analysis and considerate question.

I agree, it should be possible to accomplish what you seek with a simple and straightforward hook.

In 3e24498, I explored simply giving the native reader precedence, but it triggers a few test failures on Python 3.10 and later, all in ResourceFromNamespaceZipTests:

FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_is_submodule_resource - NotADirectoryError: MultiplexedPath only supports directories
FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_read_submodule_resource_by_name - NotADirectoryError: MultiplexedPath only supports directories
FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_submodule_contents - NotADirectoryError: MultiplexedPath only supports directories
FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_submodule_contents_by_name - NotADirectoryError: MultiplexedPath only supports directories

I'll need to dig deeper to understand the meaning behind those failures and what else it might imply about the current approach.

@jaraco
Copy link
Member

jaraco commented Feb 25, 2024

Aha! The reason the tests are failing is because the "native" NamespaceReader and MultiplexedPath on Python 3.10-3.12 doesn't yet have the fixes introduced in #290 (and presumably Python 3.13).

The mission of this project is to provide forward-compatibility to features/fixes like that, so that, for example, these zip files with namespace packages can be made available to Python 3.8. The presence of a .files() method doesn't provide enough fidelity to know if an underlying implementation has the behavior needed, and that's why the zip and namespace and file readers need to take precedence. In other words, the identified inconsistency is intentional and necessary.

Still, it does seem irresponsible to be treating a ProxyLoader as a SourceFileLoader just because it has an extant .path. My thinking now is that the replacement readers for files, zipfiles, and namespaces should only happen when they actually resolve to readers from the stdlib.

I've explored that concept in 2331da9. I tried actually resolving the reader and checking its class, but that again falls afowl of the aforementioned errors because merely allowing the stdlib readers to construct themselves fails for the ResourceFromNamespaceZipTests.

I really don't like keying on startswith('_frozen_importlib') because that feels brittle... and surely could have a different expectation on a different interpreter like PyPy or Jython.

@jaraco
Copy link
Member

jaraco commented Feb 25, 2024

In #296, I've refactored the project to separate concerns and re-use repeated logic, which should make it easier to address this concern.

jaraco added a commit that referenced this issue Feb 25, 2024
Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.
@jaraco
Copy link
Member

jaraco commented Feb 25, 2024

In #297, I've drafted a change that I think will address the reported concern. @mmerickel Can you test this concept against the use-case you described and see if it works the same as importlib.resources without any special handling?

pip install git+https://github.com/python/importlib_resources@e857e8a99a54530a1291bc95fad378390729ed53

@mmerickel
Copy link
Author

thanks so much for looking at this - I'll try to find time to get into it and try out the proposed changes

jaraco added a commit that referenced this issue Mar 6, 2024
Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.
jaraco added a commit that referenced this issue Mar 6, 2024
Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.
@jaraco
Copy link
Member

jaraco commented Mar 6, 2024

On further consideration - I see now the changes in #297 are failing on Python 3.9 and earlier, so the technique as devised isn't adequate and I'm investigating further.

@jaraco
Copy link
Member

jaraco commented Mar 6, 2024

Okay. The latest version now passes tests on all Pythons. It's a little messy, but at least encapsulated. Feel free to test with:

pip install git+https://github.com/python/importlib_resources@6d48bc93d7

@mmerickel
Copy link
Author

mmerickel commented Mar 15, 2024

@jaraco sorry I took so long on this, real life has been overwhelming. I just tested 6.3.0 and my custom get_resource_reader method is now properly invoked so I'm able to override the reader now. Thank you so much for your help resolving this! I tried it on py38 and py312 successfully.

@jaraco
Copy link
Member

jaraco commented Mar 15, 2024

Fantastic! Thanks for following up and no problem on the delay. Glad I could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants