Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a class to manage local and online resource files #2570

Merged
merged 76 commits into from
Mar 29, 2024

Conversation

pheuer
Copy link
Member

@pheuer pheuer commented Mar 11, 2024

Description

Replaces the get_file() function with a class that fetches resource files, but also keeps a register of SHA hashes for each file so that files can be re-downloaded automatically if they are updated on the PlasmaPy-data repository.

Motivation and context

The way the downloader.get_file function works, once a file exists in the directory with the requested name that file path will always be returned. There is no way of detecting when a file of the same name is updated on the github repository.

This new file uses the GitHub API to get the SHA hash for the file online, and keeps a JSON file registry of the SHA hashes for already downloaded files. If the SHA of the local file doesn't match the online one, the file will be re-downloaded. If the online connection cannot be made and the local file exists, it will be returned with a warning that it may be out of date.

Related issues

Revision of #2567 after @JaydenR2305 helpfully pointed out that the GitHub API doesn't actually require an API token for this!

Inspired by #2555 (NIST data) and #2535 (Nuclear XS data), both of which will explicitly depend on resource files that are too large to include in the base installation of PlasmaPy.

Copy link

Thank you for submitting a pull request (PR) to PlasmaPy! ✨ The future of the project depends on contributors like you, so we deeply appreciate it! 🌱

Our contributor guide has information on:

The bottom of this page shows several checks that are run for every PR. Don't worry if something broke! We break stuff all the time. 😺 Click on "Details" to learn why a check didn't pass. Please also feel free to ask for help. We do that all the time as well. 🌸 You can find us in our chat room or weekly community meeting & office hours. Here are some tips:

  • Try fixing CI / Python 3.12 test failures first.
  • Most pre-commit.ci - pr failures can be automagically fixed by commenting pre-commit.ci autofix below, followed by a git pull to bring the changes back to your computer. Please also see our pre-commit troubleshooting guide.
  • If pre-commit.ci - pr says that a function is too long or complex, try breaking up that function into multiple short functions that each do one thing. See also these tips on writing clean scientific software.
  • If the CI / Documentation check ends with a cryptic error message, check out our documentation troubleshooting guide.
  • For a documentation preview, click on Details next to docs/readthedocs.org:plasmapy.

If this PR is marked as ready for review, someone should stop by to provide a code review and offer suggestions soon. ✅ If you don't get a review within a few days, please feel free to send us a reminder.

Please also use SI units within PlasmaPy, except when there is strong justification otherwise or in some examples.

We thank you once again!

@github-actions github-actions bot added the plasmapy.utils Related to the plasmapy.utils subpackage label Mar 11, 2024
@pheuer
Copy link
Member Author

pheuer commented Mar 11, 2024

@JaydenR2305 Let me know what you think of this structure, and if you have any suggestions! I still need to write tests etc.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added docs PlasmaPy Docs at http://docs.plasmapy.org plasmapy.diagnostics Related to the plasmapy.diagnostics subpackage notebooks Related to example Jupyter notebooks in docs/examples/ labels Mar 11, 2024
@pheuer
Copy link
Member Author

pheuer commented Mar 11, 2024

Notebook changes are trivial (just updating to use the new Resources object)

Copy link

codecov bot commented Mar 11, 2024

Codecov Report

Attention: Patch coverage is 99.18699% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 96.76%. Comparing base (20e60aa) to head (c1f14c6).
Report is 78 commits behind head on main.

Files Patch % Lines
plasmapy/utils/data/downloader.py 99.17% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2570      +/-   ##
==========================================
- Coverage   96.93%   96.76%   -0.17%     
==========================================
  Files         104      104              
  Lines        9163     9472     +309     
==========================================
+ Hits         8882     9166     +284     
- Misses        281      306      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JaydenR2305
Copy link
Member

The class structure looks great so far, it's all very intuitive. My only suggestion is to make sure you go through the rigmarole to ensure that the .plasmapy directory is created if necessary

@pheuer
Copy link
Member Author

pheuer commented Mar 12, 2024

The class structure looks great so far, it's all very intuitive. My only suggestion is to make sure you go through the rigmarole to ensure that the .plasmapy directory is created if necessary

Good catch, thanks! I'll add a mkdir

@pheuer
Copy link
Member Author

pheuer commented Mar 19, 2024

@PlasmaPy/peer-reviewers , does anyone have any ideas why the variables os.environ['API_USER'] os.environ['API_TOKEN'] are just returning empty strings? I've:

  1. Defined both variables as secrets of the PlasmaPy repository
  2. Attempted to add the variables to the test environment several places in workflow/tests.yml
  3. Added the variables to passenv in tox.ini so they should be passed through to the tox environment.

I checked that the cached tox environments aren't part of the problem by deleting one of the cached files and then re-running the tests: same issue.

This test fixture in test_downloader.py is where these are accessed. If they are set correctly, then the following test will pass

@pytest.fixture()
def downloader_validated(tmp_path):
    auth_user = os.environ["API_USER"]
    auth_token = os.environ["API_TOKEN"]
    auth = (auth_user, auth_token)
    return Downloader(directory=tmp_path, api_auth=auth)


def test_api_token(downloader_validated):
    """
    Test whether the API connection is valid
    """
    limit, used = downloader_validated._api_usage
    # API limit is 5000/hr for auth user accounts, 60/hr without auth
    assert limit >= 5000

I added the following block to the tets.yml steps to see what variables are available

- name: Check available variables 
      run: env

Which prints out this

 env:
    API_TOKEN: 
    API_USER: 
    ...

So, each is a blank string

I think the problem is at the step where I'm setting the variables

env:
  API_TOKEN: ${{ secrets.API_TOKEN }}
  API_USER: ${{ secrets.API_USER }}

Because if I set one to a test string, that gets passed through to the CI ok

env:
    API_TOKEN: 
    API_USER: Test String

@pheuer
Copy link
Member Author

pheuer commented Mar 19, 2024

Arg, this StackOverflow seems to identify the problem correctly. CI running on a fork of a repository won't have access to secrets. I think this means running this CI on other PRs (once this is in main) will also not have access to the secrets...

https://stackoverflow.com/questions/58737785/github-actions-empty-env-secrets

EDIT: It turns out this is exactly why GitHub creates the GITHUB_TOKEN environmental variable, which can also be used as an api token! I just had to rewrite the Downloader HTTP request a bit so it only needs the token.

@pheuer pheuer marked this pull request as ready for review March 20, 2024 00:09
@pheuer pheuer requested a review from a team as a code owner March 20, 2024 00:09
@pheuer pheuer requested review from namurphy and removed request for a team March 20, 2024 00:09
@pheuer
Copy link
Member Author

pheuer commented Mar 20, 2024

@namurphy Victory! Using the GITHUB_TOKEN secret that GitHub creates, I was able to pass that through into tox and then access it in the script. With some modifications to the Downloader HTTP function, that now works! I wrote a test which confirms this authentication is working on CI (test_apo_token), but that test will always fail locally so I put a break in to make it always pass when NOT running on Actions.

I'll also add this environmental variable to weekly-tests.yml because presumably it will be needed there as well?

I think this PR is ready for final review!

@pheuer
Copy link
Member Author

pheuer commented Mar 28, 2024

@namurphy, just a reminder that this is ready for final review!

Copy link
Member

@namurphy namurphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder! I think once we address the suggestions here, I'm happy for us to merge it. Thank you again for doing this too!

.github/workflows/tests.yml Outdated Show resolved Hide resolved
mypy.ini Show resolved Hide resolved
plasmapy/utils/data/downloader.py Show resolved Hide resolved
plasmapy/utils/data/downloader.py Outdated Show resolved Hide resolved
plasmapy/utils/data/downloader.py Outdated Show resolved Hide resolved
plasmapy/utils/data/tests/test_downloader.py Outdated Show resolved Hide resolved
plasmapy/utils/data/tests/test_downloader.py Outdated Show resolved Hide resolved
@pheuer
Copy link
Member Author

pheuer commented Mar 29, 2024

@namurphy Great, thanks for your help, I think this is done! The remaining pre-commit issue seems to be on main (I cannot reproduce it locally, but this PR doesn't touch the file causing the issues?) Once you approve, please feel free to merge.

Copy link
Member

@namurphy namurphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@namurphy
Copy link
Member

The problem with the pre-commit check is related to a bug in ruff v0.3.4 (astral-sh/ruff#10528) that will be included in the next release. We can ignore it, so I'll use my superpowers to ignore a required check and merge this. Thank you again!

@namurphy namurphy merged commit c6759c4 into PlasmaPy:main Mar 29, 2024
16 of 17 checks passed
@pheuer pheuer deleted the improved_downloader_sha branch March 29, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Related to continuous integration docs PlasmaPy Docs at http://docs.plasmapy.org GitHub Actions A continuous integration platform for automating tests and other tasks (see .github/ directory) notebooks Related to example Jupyter notebooks in docs/examples/ plasmapy.diagnostics Related to the plasmapy.diagnostics subpackage plasmapy.utils Related to the plasmapy.utils subpackage testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants