Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify url handling with filesources #15497

Merged
merged 23 commits into from
Feb 22, 2023

Conversation

nuwang
Copy link
Member

@nuwang nuwang commented Feb 3, 2023

This PR unifies URL handling with filesources. Closes: #14658

Features

  • Enables Galaxy to handle urls through a filesource (at present, http(s), base64, ftp and drs are preconfigured, with s3 being admin configurable)
  • Enables all url sources to have an injectable user_context. For example, admins can inject credentials for protected DRS servers as in this example
    - type: drs
      id: mydrsserver
      doc: Test drs repository filesource
      http_headers:
        Authorization: |-
          #import base64
          Basic ${base64.b64encode(str.encode(user.preferences['mydrsserver|username'] + ":" + user.preferences['mydrsserver|password'])).decode()}
  • Enables admins to configure the specificity of the match. For example, admins can inject different credentials for specific sites, so that passwordless access to those sites are possible: example
    - type: http
      id: test1
      doc: A specific http url handler
      url_regex: "^https?://www.usegalaxy.org/"
      http_headers:
        Authorization: "Bearer ${user.preferences['oidc|bearer_token']}"
  • Enables various DRS access methods to be handled by existing filesources (e.g. http and s3 access methods are now routed to filesources: example
  • In future, injecting OIDC single-sign-on tokens should be possible by utilizing the work done in: OIDC tokens #15300

Backward compatibility

  • No changes to the client are necessary.
  • Filesources that inherit from BaseFileSource generally do not require any changes. (*except)
  • Except that an additional kwargs parameter has been introduced to some method signatures as a means of future expansion and for supporting chained DRS resolution
  • If a plugin directly implements the FileSource interface, it must now implement the score_url_match method to indicate its ability to handle a particular url, get_browsable method to indicate that it supports the SupportsBrowsing interface and the to_relative_path method to convert a gxfile url to a relative path. BaseFileSource handles these cases so that most FIleSources do not need any changes.
  • The get_scheme method has been removed from the interface definition as it is no longer essential, but remains in the BaseFileSource.
  • The s3.py filesource has been removed, as the s3fs.py filesource does the same thing. The reasons for removing s3.py include its dependencies being less maintained (last updated in 2019) and having less config documentation, in contrast to s3fs.py

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@github-actions github-actions bot added this to the 23.1 milestone Feb 3, 2023
@nuwang nuwang force-pushed the unified-filesources branch 2 times, most recently from 9ffb107 to 70ee2cf Compare February 3, 2023 12:18
@nuwang nuwang changed the title Unified filesources Unify url handling with filesources Feb 3, 2023
@jmchilton
Copy link
Member

I was somewhat skeptical but this is very nice - wonderful job!

The **kwargs getting passed everywhere is very typical Galaxy and feels at home in this codebase and is a thing I've done with my own plugin frameworks in the past. I think we should perhaps try to get away from that though by adding a bit more structure. It looks like there is a limited number of known options with fixed types for the plugins we've defined - could we capture all of those in a TypedDict and then use typing_extensions.Unpack to annotate all those kwargs.

It looks like TypedDict can specify they don't capture the totality of options (https://mypy.readthedocs.io/en/stable/more_types.html#totality), which would allow downstream plugins to do their own thing as long as they don't conflict with existing plugins.

If you're uninterested in working through all the details - would you be willing to at least defined the dictionaries of possible options for me and annotate what interfaces accept which ones with a comment or something? I could probably work through the possibilities but you're much closer to the code now. Even if we don't figure out how to use the experimental mypy features - I think that would serve as good documentation for the module.

assert_realizes_as(file_sources, test_url, "hello drs world", user_context=user_context)


@responses.activate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cool!

@nuwang
Copy link
Member Author

nuwang commented Feb 3, 2023

I think we should perhaps try to get away from that though by adding a bit more structure. It looks like there is a limited number of known options with fixed types for the plugins we've defined - could we capture all of those in a TypedDict and then use typing_extensions.Unpack to annotate all those kwargs.

This is a very good point and I fully agree with the runaway nature of a kwargs dict - it's hard to stop it from devolving into an incomprehensible bag of data or know who's depending or modifying which bits. I added it because I didn't feel I had many options - it was either a matter of polluting the interface with weird properties, introducing another object similar to a UserContext, or tacking it onto the UserContext itself. Ultimately, I just chose the easy way out with kwargs. I didn't know about TypedDicts so I'm happy to look into that and give it a go, and am also ears to other approaches to handling this.

@nuwang nuwang force-pushed the unified-filesources branch 3 times, most recently from a3b9a18 to 1f63e03 Compare February 9, 2023 14:16
@nuwang
Copy link
Member Author

nuwang commented Feb 9, 2023

@jmchilton I've made the changes that you've proposed. The TypedDict with Unpack works well, and flagged a number of relevant issues, even though it's still experimental and required a special switch to enable. I eventually switched the kwargs over to a FileSourceOptions object. I thought it made for a cleaner design - too many Unpack/TypedDicts are potentially confusing I think. However, I did use TypedDicts with Unpack for all the constructor arguments, and the whole thing seems to work quite nicely together. Thanks for the suggestion and let me know whether this looks better.

@nuwang nuwang force-pushed the unified-filesources branch 3 times, most recently from 64e1991 to 4bfa44f Compare February 9, 2023 18:25
@jmchilton jmchilton merged commit c692ad8 into galaxyproject:dev Feb 22, 2023
@jmchilton
Copy link
Member

Hats off to you @nuwang - this is wonderful!

@nuwang nuwang deleted the unified-filesources branch February 23, 2023 13:57
nsoranzo added a commit to nsoranzo/planemo that referenced this pull request Jun 20, 2023
which was failing with:

```
E           galaxy.tool_util.provided_metadata INFO 2023-06-19 12:49:16,823 [pN:main.1,p:15735,tN:LocalRunner.work_thread-0] One or more tool outputs is marked as failed ({'type': 'dataset', 'ext': 'data', 'dataset_id': 1, 'stderr': 'Unable to fetch file:///home/runner/work/planemo/planemo/tests/data/hello.txt\nCould not find handler for URI [file:///home/runner/work/planemo/planemo/tests/data/hello.txt]', 'failed': True}).
```

After galaxyproject/galaxy#15497 and
follow-up galaxyproject/galaxy#15794 ,
data upload using "file://" URLs is allowed only for Galaxy admins.
This adds a basic ``file_sources_conf.yml`` configuration file that
allows planemo to upload from the test data directory.
@galaxyproject galaxyproject deleted a comment from github-actions bot Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal for unified filesources and URL fetchers
3 participants