Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support get file(notebook) md5 #1363

Merged
merged 6 commits into from Nov 19, 2023

Conversation

Wh1isper
Copy link
Contributor

@Wh1isper Wh1isper commented Nov 18, 2023

Motivation

Currently, JupyterLab determines whether a file has been manually modified by another client or user by comparing timestamps.

In this way, JupyterLab can prompt the user whether to overwrite the file on disk with the file on the current page, or discard the file on the current page.

This usually performs well on local storage. However, when users use remote storage such as nfs, samba, the modified time of the file is not reliable due to things such as network latency or storage back-end implementations. To solve this problem, jupyterlab/jupyterlab#15037 tries to compare content by comparing them one by one, but this definitely increases the network overhead and client burden.

What is this PR

I propose that this PR to add md5 parameter to the contents API. Any client can get the md5 value of the current file as it needs it and compare it to the previous value to make sure the file has not been modified by another program.

Performance Impact

The server-side calculation of md5 values has essentially no performance impact and it's optional.

Create a 5M file with random bytes.

dd bs=1024 count=5120 </dev/urandom > 5mfile

Calculate its md5.

import hashlib
def caculate(file):
    md5 = hashlib.md5()
    with open(file, "rb") as f:
        c = f.read()
    md5.update(c)
    return md5.hexdigest()
caculate("5mfile")

Here I'm using hyperfine to help me test it.

$hyperfine "python main.py"
Benchmark 1: python main.py
  Time (mean ± σ):      29.4 ms ±   1.4 ms    [User: 24.6 ms, System: 4.7 ms]
  Range (min … max):    27.6 ms …  34.8 ms    91 runs

Copy link
Collaborator

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just one minor comment

jupyter_server/services/contents/handlers.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@blink1073 blink1073 merged commit ea6ceee into jupyter-server:main Nov 19, 2023
35 of 38 checks passed
@blink1073
Copy link
Collaborator

I'll cut a minor release early next week.

@Wh1isper Wh1isper deleted the feature-file-md5 branch November 19, 2023 12:30
@bollwyvl
Copy link
Contributor

In 2023, adding the broken md5, for any reason, is probably not a good play. Even when not used for security purposes, the mere presence of it (or any other known-broken algorithm) causes bells and whistles to go off in downstream's inboxes.

If, for some reason, this algorithm must be used, adding the explicit usedforsecurity=False is a way to quite some of the bells. But better would be to use a not-yet-broken hash, such as sha256 or the most modern blake* available in all supported pythons.

@krassowski
Copy link
Collaborator

An extra consideration would be whether frontends can compute the hash too to save on a round trip. Hence my suggestion on using murmur as this is what debugger uses. Does murmur trigger downstreams you are speaking of too?

@bollwyvl
Copy link
Contributor

bollwyvl commented Nov 21, 2023 via email

@Wh1isper
Copy link
Contributor Author

Sorry for the breaking changes to the API. I have created #1367 to fix this.

sigmarkarl added a commit to spotinst/jupyter_server that referenced this pull request Dec 13, 2023
* ContentsHandler return 404 rather than raise exc (jupyter-server#1357)

* Add more typings (jupyter-server#1356)

* Publish 2.10.1

SHA256 hashes:

jupyter_server-2.10.1-py3-none-any.whl: 20519e355d951fc5e1b6ac5952854fe7620d0cfb56588fa4efe362a758977ed3

jupyter_server-2.10.1.tar.gz: e6da2657a954a7879eed28cc08e0817b01ffd81d7eab8634660397b55f926472

* Bump to 2.11.0.dev0

* typo: ServerApp (jupyter-server#1361)

* Support get file(notebook) md5 (jupyter-server#1363)

* Update ruff and typings (jupyter-server#1365)

* Update api docs with md5 param (jupyter-server#1364)

* Publish 2.11.0

SHA256 hashes:

jupyter_server-2.11.0-py3-none-any.whl: c9bd6e6d71dc5a2a25df167dc323422997f14682b008bfecb5d7920a55020ea7

jupyter_server-2.11.0.tar.gz: 78c97ec8049f9062f0151725bc8a1364dfed716646a66819095e0e8a24793eba

* Bump to 2.12.0.dev0

* Change md5 to hash and hash_algorithm, fix incompatibility (jupyter-server#1367)

Co-authored-by: Frédéric Collonval <fcollonval@gmail.com>

* avoid unhandled error on some invalid paths (jupyter-server#1369)

* Publish 2.11.1

SHA256 hashes:

jupyter_server-2.11.1-py3-none-any.whl: 4b3a16e3ed16fd202588890f10b8ca589bd3e29405d128beb95935f059441373

jupyter_server-2.11.1.tar.gz: fe80bab96493acf5f7d6cd9a1575af8fbd253dc2591aa4d015131a1e03b5799a

* Bump to 2.12.0.dev0

* Merge pull request from GHSA-h56g-gq9v-vc8r

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>

* Publish 2.11.2

SHA256 hashes:

jupyter_server-2.11.2-py3-none-any.whl: 0c548151b54bcb516ca466ec628f7f021545be137d01b5467877e87f6fff4374

jupyter_server-2.11.2.tar.gz: 0c99f9367b0f24141e527544522430176613f9249849be80504c6d2b955004bb

* Bump to 2.12.0.dev0

* chore: update pre-commit hooks (jupyter-server#1370)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Steven Silvester <steven.silvester@ieee.org>

* Update for tornado 6.4 (jupyter-server#1372)

* Support async Authorizers (jupyter-server#1373)

* Publish 2.12.0

SHA256 hashes:

jupyter_server-2.12.0-py3-none-any.whl: 3482912efa4387bb1edc23ba60531796aff3b6d6a6e93a5810f5719e2bdb48b7

jupyter_server-2.12.0.tar.gz: 9fa74ed3bb931cf33f42b3d9046e2788328ec9e6dcc59d48aa3e0910a491e3e4

* Bump to 2.13.0.dev0

* log extension import time at debug level unless it's actually slow (jupyter-server#1375)

* Add support for async Authorizers (part 2) (jupyter-server#1374)

* Publish 2.12.1

SHA256 hashes:

jupyter_server-2.12.1-py3-none-any.whl: fd030dd7be1ca572e4598203f718df6630c12bd28a599d7f1791c4d7938e1010

jupyter_server-2.12.1.tar.gz: dc77b7dcc5fc0547acba2b2844f01798008667201eea27c6319ff9257d700a6d

---------

Co-authored-by: Sam Bloomquist <bloomquist.sam@gmail.com>
Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Co-authored-by: blink1073 <blink1073@users.noreply.github.com>
Co-authored-by: IITII <ccmejx@gmail.com>
Co-authored-by: Zhongsheng Ji <9573586@qq.com>
Co-authored-by: Frédéric Collonval <fcollonval@gmail.com>
Co-authored-by: Min RK <benjaminrk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zachary Sailer <zsailer@apple.com>
Co-authored-by: Zsailer <Zsailer@users.noreply.github.com>
sigmarkarl added a commit to spotinst/jupyter_server that referenced this pull request Jan 15, 2024
* Create CODEOWNERS

* ContentsHandler return 404 rather than raise exc (jupyter-server#1357)

* Add more typings (jupyter-server#1356)

* Publish 2.10.1

SHA256 hashes:

jupyter_server-2.10.1-py3-none-any.whl: 20519e355d951fc5e1b6ac5952854fe7620d0cfb56588fa4efe362a758977ed3

jupyter_server-2.10.1.tar.gz: e6da2657a954a7879eed28cc08e0817b01ffd81d7eab8634660397b55f926472

* Bump to 2.11.0.dev0

* typo: ServerApp (jupyter-server#1361)

* Support get file(notebook) md5 (jupyter-server#1363)

* Update ruff and typings (jupyter-server#1365)

* Update api docs with md5 param (jupyter-server#1364)

* Publish 2.11.0

SHA256 hashes:

jupyter_server-2.11.0-py3-none-any.whl: c9bd6e6d71dc5a2a25df167dc323422997f14682b008bfecb5d7920a55020ea7

jupyter_server-2.11.0.tar.gz: 78c97ec8049f9062f0151725bc8a1364dfed716646a66819095e0e8a24793eba

* Bump to 2.12.0.dev0

* Change md5 to hash and hash_algorithm, fix incompatibility (jupyter-server#1367)

Co-authored-by: Frédéric Collonval <fcollonval@gmail.com>

* avoid unhandled error on some invalid paths (jupyter-server#1369)

* Publish 2.11.1

SHA256 hashes:

jupyter_server-2.11.1-py3-none-any.whl: 4b3a16e3ed16fd202588890f10b8ca589bd3e29405d128beb95935f059441373

jupyter_server-2.11.1.tar.gz: fe80bab96493acf5f7d6cd9a1575af8fbd253dc2591aa4d015131a1e03b5799a

* Bump to 2.12.0.dev0

* Merge pull request from GHSA-h56g-gq9v-vc8r

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>

* Publish 2.11.2

SHA256 hashes:

jupyter_server-2.11.2-py3-none-any.whl: 0c548151b54bcb516ca466ec628f7f021545be137d01b5467877e87f6fff4374

jupyter_server-2.11.2.tar.gz: 0c99f9367b0f24141e527544522430176613f9249849be80504c6d2b955004bb

* Bump to 2.12.0.dev0

* chore: update pre-commit hooks (jupyter-server#1370)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Steven Silvester <steven.silvester@ieee.org>

* Update for tornado 6.4 (jupyter-server#1372)

* Support async Authorizers (jupyter-server#1373)

* Publish 2.12.0

SHA256 hashes:

jupyter_server-2.12.0-py3-none-any.whl: 3482912efa4387bb1edc23ba60531796aff3b6d6a6e93a5810f5719e2bdb48b7

jupyter_server-2.12.0.tar.gz: 9fa74ed3bb931cf33f42b3d9046e2788328ec9e6dcc59d48aa3e0910a491e3e4

* Bump to 2.13.0.dev0

* log extension import time at debug level unless it's actually slow (jupyter-server#1375)

* Add support for async Authorizers (part 2) (jupyter-server#1374)

* Publish 2.12.1

SHA256 hashes:

jupyter_server-2.12.1-py3-none-any.whl: fd030dd7be1ca572e4598203f718df6630c12bd28a599d7f1791c4d7938e1010

jupyter_server-2.12.1.tar.gz: dc77b7dcc5fc0547acba2b2844f01798008667201eea27c6319ff9257d700a6d

* Bump to 2.13.0.dev0

* Use ruff docstring-code-format (jupyter-server#1377)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable htmlzip and epub on readthedocs (jupyter-server#1379)

* Update pre-commit deps (jupyter-server#1380)

* Fix a typo in error message (jupyter-server#1381)

* Force legacy ws subprotocol when using gateway (jupyter-server#1311)

Co-authored-by: Emmanuel Pignot <emmanuel.pignot@netapp.com>
Co-authored-by: Zachary Sailer <zachsailer@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zachary Sailer <zsailer@apple.com>

* Publish 2.12.2

SHA256 hashes:

jupyter_server-2.12.2-py3-none-any.whl: abcfa33f98a959f908c8733aa2d9fa0101d26941cbd49b148f4cef4d3046fc61

jupyter_server-2.12.2.tar.gz: 5eae86be15224b5375cdec0c3542ce72ff20f7a25297a2a8166a250bb455a519

* Bump to 2.13.0.dev0

* Fix test param for pytest-xdist (jupyter-server#1382)

* Simplify the jupytext downstream test (jupyter-server#1383)

* Import User unconditionally (jupyter-server#1384)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Publish 2.12.3

SHA256 hashes:

jupyter_server-2.12.3-py3-none-any.whl: 6f85310ea5e6068568a521f079fba99d8d17e4884dd1d602ab0f43b3115204a8

jupyter_server-2.12.3.tar.gz: a1d2d51e497b1a6256c48b6940b0dd49b2553981baf1690077c37792f1fa23a1

* Bump to 2.13.0.dev0

* Fix log arguments for gateway client error (jupyter-server#1385)

* Publish 2.12.4

SHA256 hashes:

jupyter_server-2.12.4-py3-none-any.whl: a125ae18a60de568f78f55c84dd58759901a18ef279abf0418ac220653ca1320

jupyter_server-2.12.4.tar.gz: 41f4a1e6b912cc24a7c6c694851b37d3d8412b180f43d72315fe422cb2b85cc2

* merge 2.12.4 from upstream

* merge 2.12.4 from upstream

---------

Co-authored-by: Lironavr1 <107797717+Lironavr1@users.noreply.github.com>
Co-authored-by: Sam Bloomquist <bloomquist.sam@gmail.com>
Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Co-authored-by: blink1073 <blink1073@users.noreply.github.com>
Co-authored-by: IITII <ccmejx@gmail.com>
Co-authored-by: Zhongsheng Ji <9573586@qq.com>
Co-authored-by: Frédéric Collonval <fcollonval@gmail.com>
Co-authored-by: Min RK <benjaminrk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zachary Sailer <zsailer@apple.com>
Co-authored-by: Zsailer <Zsailer@users.noreply.github.com>
Co-authored-by: Nicholas Bollweg <nick.bollweg@gmail.com>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
Co-authored-by: Manu <21658174+epignot@users.noreply.github.com>
Co-authored-by: Emmanuel Pignot <emmanuel.pignot@netapp.com>
Co-authored-by: Zachary Sailer <zachsailer@gmail.com>
Co-authored-by: Gonzalo Tornaría <tornaria@gmail.com>
Co-authored-by: Marc Wouts <marc.wouts@gmail.com>
Co-authored-by: Yuvi Panda <yuvipanda@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants