reader resolution algorithm seems inconsistent between stdlib and importlib_resources #295

mmerickel · 2024-02-06T07:13:50Z

Hello friends,

I'm looking at how to reimplement resource overrides in Pyramid. It's a unique feature that in the past involved patching a custom package.__loader__ which could intercept and do its own file lookups for various requests. This worked with all pkg_resources apis, like pkg_resources.resource_filename, etc to return a different path than the one physically on disk when a path is overridden.

To implement this, I've been going through the PEPs and trying to understand the protocol that importlib.resources uses to support overriding how resources are loaded from a package. I've narrowed this down to spec.loader.get_resource_reader() being able to be defined. My plan will be to define a ProxyLoader which defines its own get_resource_reader but proxies the rest of the loader functionality back to the original loader. So, I'm looking at how to define a custom abc.TraversableResources class that does what I need, and then hook package.__spec__.loader to return it from get_resource_reader().

From what I can tell, the logic here is inconsistent between importlib_resources and the stdlib impl in CPython. I'm looking here:

https://github.com/python/cpython/blob/5ddb2740404ad62e74ecb0ea97ed98651821ed3f/Lib/importlib/resources/_adapters.py#L28-L29

compared to

importlib_resources/importlib_resources/_compat.py

Lines 75 to 90 in 7d8b020

    
           return ( 
        
               # local ZipReader if a zip module 
        
               _zip_reader(self.spec) 
        
               or 
        
               # local NamespaceReader if a namespace module 
        
               _namespace_reader(self.spec) 
        
               or 
        
               # local FileReader 
        
               _file_reader(self.spec) 
        
               or 
        
               # native reader if it supplies 'files' 
        
               _native_reader(self.spec) 
        
               or 
        
               # fallback - adapt the spec ResourceReader to TraversableReader 
        
               _adapters.CompatibilityFiles(self.spec) 
        
           )

In both cases, TraversableResourceLoader is attempting to determine which reader to return. In the stdlib, preference is given to the native spec.loader.get_resource_reader() if the reader returned has a files() method. However, importlib_resources inverts the lookup such that zip/namespace/file take priority over the "native" reader. So for example, if self.path exists, then whatever I set as spec.loader.get_resource_reader() will never be invoked and hence I can't override the lookups.

Currently, to get this to work with importlib_resources, it appears that I would need the spec.path to be a non-existent path, which would trigger the logic to fallback to the native reader that I patch, but this is by far not ideal.

The TLDR is that I would propose importlib_resources moves the _native_reader(self.spec) invocation to the TOP of the list, which would be more in line with the stdlib.

I would really appreciate any guidance here. Maybe I'm completely off base and missing something critical in the importlib machinery, and happy to talk more about this use-case in Pyramid that has been supported for many years via a PEP320 loader used by pkg_resources. Thank you!

The text was updated successfully, but these errors were encountered:

mmerickel · 2024-02-24T19:30:19Z

No activity? :-( @jaraco?

jaraco · 2024-02-24T23:27:37Z

Thanks for the ping. I've been overloaded (mainly with life events). I'll plan to take a look soon.

jaraco · 2024-02-25T15:59:58Z

First, thanks for sharing your detailed analysis and considerate question.

I agree, it should be possible to accomplish what you seek with a simple and straightforward hook.

In 3e24498, I explored simply giving the native reader precedence, but it triggers a few test failures on Python 3.10 and later, all in ResourceFromNamespaceZipTests:

FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_is_submodule_resource - NotADirectoryError: MultiplexedPath only supports directories
FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_read_submodule_resource_by_name - NotADirectoryError: MultiplexedPath only supports directories
FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_submodule_contents - NotADirectoryError: MultiplexedPath only supports directories
FAILED importlib_resources/tests/test_resource.py::ResourceFromNamespaceZipTests::test_submodule_contents_by_name - NotADirectoryError: MultiplexedPath only supports directories

I'll need to dig deeper to understand the meaning behind those failures and what else it might imply about the current approach.

jaraco · 2024-02-25T17:00:28Z

Aha! The reason the tests are failing is because the "native" NamespaceReader and MultiplexedPath on Python 3.10-3.12 doesn't yet have the fixes introduced in #290 (and presumably Python 3.13).

The mission of this project is to provide forward-compatibility to features/fixes like that, so that, for example, these zip files with namespace packages can be made available to Python 3.8. The presence of a .files() method doesn't provide enough fidelity to know if an underlying implementation has the behavior needed, and that's why the zip and namespace and file readers need to take precedence. In other words, the identified inconsistency is intentional and necessary.

Still, it does seem irresponsible to be treating a ProxyLoader as a SourceFileLoader just because it has an extant .path. My thinking now is that the replacement readers for files, zipfiles, and namespaces should only happen when they actually resolve to readers from the stdlib.

I've explored that concept in 2331da9. I tried actually resolving the reader and checking its class, but that again falls afowl of the aforementioned errors because merely allowing the stdlib readers to construct themselves fails for the ResourceFromNamespaceZipTests.

I really don't like keying on startswith('_frozen_importlib') because that feels brittle... and surely could have a different expectation on a different interpreter like PyPy or Jython.

jaraco · 2024-02-25T19:04:05Z

In #296, I've refactored the project to separate concerns and re-use repeated logic, which should make it easier to address this concern.

Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.

jaraco · 2024-02-25T19:22:48Z

In #297, I've drafted a change that I think will address the reported concern. @mmerickel Can you test this concept against the use-case you described and see if it works the same as importlib.resources without any special handling?

pip install git+https://github.com/python/importlib_resources@e857e8a99a54530a1291bc95fad378390729ed53

mmerickel · 2024-02-26T06:52:35Z

thanks so much for looking at this - I'll try to find time to get into it and try out the proposed changes

Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.

jaraco · 2024-03-06T21:24:26Z

On further consideration - I see now the changes in #297 are failing on Python 3.9 and earlier, so the technique as devised isn't adequate and I'm investigating further.

jaraco · 2024-03-06T23:12:11Z

Okay. The latest version now passes tests on all Pythons. It's a little messy, but at least encapsulated. Feel free to test with:

pip install git+https://github.com/python/importlib_resources@6d48bc93d7

mmerickel · 2024-03-15T02:54:19Z

@jaraco sorry I took so long on this, real life has been overwhelming. I just tested 6.3.0 and my custom get_resource_reader method is now properly invoked so I'm able to override the reader now. Thank you so much for your help resolving this! I tried it on py38 and py312 successfully.

jaraco · 2024-03-15T13:50:47Z

Fantastic! Thanks for following up and no problem on the delay. Glad I could help.

mmerickel mentioned this issue Feb 6, 2024

pkg_resources is deprecated Pylons/pyramid#3731

Open

jaraco added a commit that referenced this issue Feb 25, 2024

Give precedence to native traversable readers.

e857e8a

Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.

jaraco mentioned this issue Feb 25, 2024

Give precedence to native traversable readers. #297

Merged

jaraco added a commit that referenced this issue Mar 6, 2024

Give precedence to native traversable readers.

bf02178

Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.

jaraco added a commit that referenced this issue Mar 6, 2024

Give precedence to native traversable readers.

527173b

Ensure that standard library readers are replaced while third-party readers are passed along. Closes #295.

jaraco assigned mmerickel Mar 6, 2024

jaraco closed this as completed in #297 Mar 12, 2024

jaraco mentioned this issue Mar 15, 2024

Loading file in library fails in downstream packages #298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reader resolution algorithm seems inconsistent between stdlib and importlib_resources #295

reader resolution algorithm seems inconsistent between stdlib and importlib_resources #295

mmerickel commented Feb 6, 2024 •

edited

mmerickel commented Feb 24, 2024

jaraco commented Feb 24, 2024

jaraco commented Feb 25, 2024

jaraco commented Feb 25, 2024 •

edited

jaraco commented Feb 25, 2024

jaraco commented Feb 25, 2024

mmerickel commented Feb 26, 2024

jaraco commented Mar 6, 2024

jaraco commented Mar 6, 2024

mmerickel commented Mar 15, 2024 •

edited

jaraco commented Mar 15, 2024

reader resolution algorithm seems inconsistent between stdlib and importlib_resources #295

reader resolution algorithm seems inconsistent between stdlib and importlib_resources #295

Comments

mmerickel commented Feb 6, 2024 • edited

mmerickel commented Feb 24, 2024

jaraco commented Feb 24, 2024

jaraco commented Feb 25, 2024

jaraco commented Feb 25, 2024 • edited

jaraco commented Feb 25, 2024

jaraco commented Feb 25, 2024

mmerickel commented Feb 26, 2024

jaraco commented Mar 6, 2024

jaraco commented Mar 6, 2024

mmerickel commented Mar 15, 2024 • edited

jaraco commented Mar 15, 2024

mmerickel commented Feb 6, 2024 •

edited

jaraco commented Feb 25, 2024 •

edited

mmerickel commented Mar 15, 2024 •

edited