Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change md5 to hash and hash_algorithm, fix incompatibility #1367

Merged
merged 16 commits into from Nov 24, 2023
18 changes: 18 additions & 0 deletions .github/workflows/downstream.yml
Expand Up @@ -107,6 +107,23 @@ jobs:
test_command: pip install pytest-jupyter[server] && pytest -vv -raXxs -W default --durations 10 --color=yes
package_name: jupyter_server_terminals

jupytext:
runs-on: ubuntu-latest
timeout-minutes: 10

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Base Setup
uses: jupyterlab/maintainer-tools/.github/actions/base-setup@v1

- name: Test jupytext
uses: jupyterlab/maintainer-tools/.github/actions/downstream-test@v1
with:
package_name: jupytext
test_command: pip install pytest-jupyter[server] gitpython pre-commit && python -m ipykernel install --name jupytext-dev --user && pytest -vv -raXxs -W default --durations 10 --color=yes --ignore=tests/test_doc_files_are_notebooks.py --ignore=tests/test_changelog.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_doc_files_are_notebooks.py and test_changelog.py need docs floder, but not in the distribution.


downstream_check: # This job does nothing and is only used for the branch protection
if: always()
needs:
Expand All @@ -115,6 +132,7 @@ jobs:
- jupyterlab_server
- notebook
- nbclassic
- jupytext
runs-on: ubuntu-latest
steps:
- name: Decide whether the needed jobs succeeded or failed
Expand Down
10 changes: 5 additions & 5 deletions docs/source/developers/contents.rst
Expand Up @@ -63,7 +63,7 @@ Models may contain the following entries:
| |``None`` |if any. (:ref:`See |
| | |Below<modelcontent>`) |
+--------------------+-----------+------------------------------+
|**md5** |unicode or |The md5 of the contents. |
|**sha256** |unicode or |The sha256 of the contents. |
| |``None`` | |
| | | |
+--------------------+-----------+------------------------------+
Expand All @@ -80,7 +80,7 @@ model. There are three model types: **notebook**, **file**, and **directory**.
:class:`nbformat.notebooknode.NotebookNode` representing the .ipynb file
represented by the model. See the `NBFormat`_ documentation for a full
description.
- The ``md5`` field a hexdigest string of the md5 value of the notebook
- The ``sha256`` field a hexdigest string of the sha256 value of the notebook
file.

- ``file`` models
Expand All @@ -91,14 +91,14 @@ model. There are three model types: **notebook**, **file**, and **directory**.
file models, ``content`` simply contains the file's bytes after decoding
as UTF-8. Non-text (``base64``) files are read as bytes, base64 encoded,
and then decoded as UTF-8.
- The ``md5`` field a hexdigest string of the md5 value of the file.
- The ``sha256`` field a hexdigest string of the sha256 value of the file.

- ``directory`` models
- The ``format`` field is always ``"json"``.
- The ``mimetype`` field is always ``None``.
- The ``content`` field contains a list of :ref:`content-free<contentfree>`
models representing the entities in the directory.
- The ``md5`` field is always ``None``.
- The ``sha256`` field is always ``None``.

.. note::

Expand Down Expand Up @@ -137,7 +137,7 @@ model. There are three model types: **notebook**, **file**, and **directory**.
"path": "foo/a.ipynb",
"type": "notebook",
"writable": True,
"md5": "7e47382b370c05a1b14706a2a8aff91a",
"sha256": "f5e43a0b1c2e7836ab3b4d6b1c35c19e2558688de15a6a14e137a59e4715d34b",
}

# Notebook Model without Content
Expand Down
12 changes: 6 additions & 6 deletions jupyter_server/services/api/api.yaml
Expand Up @@ -106,9 +106,9 @@ paths:
in: query
description: "Return content (0 for no content, 1 for return content)"
type: integer
- name: md5
- name: sha256
in: query
description: "Return md5 hexdigest string of content (0 for no md5, 1 for return md5)"
description: "Return sha256 hexdigest string of content (0 for no sha256, 1 for return sha256)"
type: integer
responses:
404:
Expand Down Expand Up @@ -889,7 +889,7 @@ definitions:
kernel:
$ref: "#/definitions/Kernel"
Contents:
description: "A contents object. The content and format keys may be null if content is not contained. The md5 maybe null if md5 is not contained. If type is 'file', then the mimetype will be null."
description: "A contents object. The content and format keys may be null if content is not contained. The sha256 maybe null if sha256 is not contained. If type is 'file', then the mimetype will be null."
type: object
required:
- type
Expand All @@ -901,7 +901,7 @@ definitions:
- mimetype
- format
- content
- md5
- sha256
properties:
name:
type: string
Expand Down Expand Up @@ -939,9 +939,9 @@ definitions:
format:
type: string
description: Format of content (one of null, 'text', 'base64', 'json')
md5:
sha256:
type: string
description: "The md5 hexdigest string of content, if requested (otherwise null)."
description: "The sha256 hexdigest string of content, if requested (otherwise null)."
Checkpoints:
description: A checkpoint object.
type: object
Expand Down
16 changes: 8 additions & 8 deletions jupyter_server/services/contents/fileio.py
Expand Up @@ -357,11 +357,11 @@ def _save_file(self, os_path, content, format):
with self.atomic_writing(os_path, text=False) as f:
f.write(bcontent)

def _get_md5(self, os_path):
def _get_sha256(self, os_path):
c, _ = self._read_file(os_path, "byte")
md5 = hashlib.md5()
md5.update(c)
return md5.hexdigest()
sha256 = hashlib.sha256()
sha256.update(c)
return sha256.hexdigest()


class AsyncFileManagerMixin(FileManagerMixin):
Expand Down Expand Up @@ -475,8 +475,8 @@ async def _save_file(self, os_path, content, format):
with self.atomic_writing(os_path, text=False) as f:
await run_sync(f.write, bcontent)

async def _get_md5(self, os_path):
async def _get_sha256(self, os_path):
c, _ = await self._read_file(os_path, "byte")
md5 = hashlib.md5()
await run_sync(md5.update, c)
return md5.hexdigest()
sha256 = hashlib.sha256()
await run_sync(sha256.update, c)
return sha256.hexdigest()
53 changes: 28 additions & 25 deletions jupyter_server/services/contents/filemanager.py
Expand Up @@ -48,6 +48,7 @@ class FileContentsManager(FileManagerMixin, ContentsManager):
root_dir = Unicode(config=True)

max_copy_folder_size_mb = Int(500, config=True, help="The max folder size that can be copied")
support_sha256 = Bool(True, config=False, help="Support sha256 argument in `get`")

@default("root_dir")
def _default_root_dir(self):
Expand Down Expand Up @@ -268,7 +269,7 @@ def _base_model(self, path):
model["mimetype"] = None
model["size"] = size
model["writable"] = self.is_writable(path)
model["md5"] = None
model["sha256"] = None

return model

Expand Down Expand Up @@ -336,7 +337,7 @@ def _dir_model(self, path, content=True):

return model

def _file_model(self, path, content=True, format=None, md5=False):
def _file_model(self, path, content=True, format=None, sha256=False):
"""Build a model for a file

if content is requested, include the file contents.
Expand Down Expand Up @@ -365,13 +366,13 @@ def _file_model(self, path, content=True, format=None, md5=False):
content=content,
format=format,
)
if md5:
md5 = self._get_md5(os_path)
model.update(md5=md5)
if sha256:
sha256 = self._get_sha256(os_path)
model.update(sha256=sha256)

return model

def _notebook_model(self, path, content=True, md5=False):
def _notebook_model(self, path, content=True, sha256=False):
"""Build a notebook model

if content is requested, the notebook content will be populated
Expand All @@ -390,12 +391,12 @@ def _notebook_model(self, path, content=True, md5=False):
model["content"] = nb
model["format"] = "json"
self.validate_notebook_model(model, validation_error)
if md5:
model["md5"] = self._get_md5(os_path)
if sha256:
model["sha256"] = self._get_sha256(os_path)

return model

def get(self, path, content=True, type=None, format=None, md5=None):
def get(self, path, content=True, type=None, format=None, sha256=None):
"""Takes a path for an entity and returns its model

Parameters
Expand All @@ -410,8 +411,8 @@ def get(self, path, content=True, type=None, format=None, md5=None):
format : str, optional
The requested format for file contents. 'text' or 'base64'.
Ignored if this returns a notebook or directory model.
md5: bool, optional
Whether to include the md5 of the file contents.
sha256: bool, optional
Whether to include the sha256 of the file contents.

Returns
-------
Expand Down Expand Up @@ -439,11 +440,11 @@ def get(self, path, content=True, type=None, format=None, md5=None):
)
model = self._dir_model(path, content=content)
elif type == "notebook" or (type is None and path.endswith(".ipynb")):
model = self._notebook_model(path, content=content, md5=md5)
model = self._notebook_model(path, content=content, sha256=sha256)
else:
if type == "directory":
raise web.HTTPError(400, "%s is not a directory" % path, reason="bad type")
model = self._file_model(path, content=content, format=format, md5=md5)
model = self._file_model(path, content=content, format=format, sha256=sha256)
self.emit(data={"action": "get", "path": path})
return model

Expand Down Expand Up @@ -725,6 +726,8 @@ def _human_readable_size(self, size):
class AsyncFileContentsManager(FileContentsManager, AsyncFileManagerMixin, AsyncContentsManager):
"""An async file contents manager."""

support_sha256 = Bool(True, config=False, help="Support sha256 argument in `get`")

@default("checkpoints_class")
def _checkpoints_class_default(self):
return AsyncFileCheckpoints
Expand Down Expand Up @@ -794,7 +797,7 @@ async def _dir_model(self, path, content=True):

return model

async def _file_model(self, path, content=True, format=None, md5=False):
async def _file_model(self, path, content=True, format=None, sha256=False):
"""Build a model for a file

if content is requested, include the file contents.
Expand Down Expand Up @@ -823,13 +826,13 @@ async def _file_model(self, path, content=True, format=None, md5=False):
content=content,
format=format,
)
if md5:
md5 = await self._get_md5(os_path)
model.update(md5=md5)
if sha256:
sha256 = await self._get_sha256(os_path)
model.update(sha256=sha256)

return model

async def _notebook_model(self, path, content=True, md5=False):
async def _notebook_model(self, path, content=True, sha256=False):
"""Build a notebook model

if content is requested, the notebook content will be populated
Expand All @@ -848,12 +851,12 @@ async def _notebook_model(self, path, content=True, md5=False):
model["content"] = nb
model["format"] = "json"
self.validate_notebook_model(model, validation_error)
if md5:
model["md5"] = await self._get_md5(os_path)
if sha256:
model["sha256"] = await self._get_sha256(os_path)

return model

async def get(self, path, content=True, type=None, format=None, md5=False):
async def get(self, path, content=True, type=None, format=None, sha256=False):
"""Takes a path for an entity and returns its model

Parameters
Expand All @@ -868,8 +871,8 @@ async def get(self, path, content=True, type=None, format=None, md5=False):
format : str, optional
The requested format for file contents. 'text' or 'base64'.
Ignored if this returns a notebook or directory model.
md5: bool, optional
Whether to include the md5 of the file contents.
sha256: bool, optional
Whether to include the sha256 of the file contents.

Returns
-------
Expand All @@ -892,11 +895,11 @@ async def get(self, path, content=True, type=None, format=None, md5=False):
)
model = await self._dir_model(path, content=content)
elif type == "notebook" or (type is None and path.endswith(".ipynb")):
model = await self._notebook_model(path, content=content, md5=md5)
model = await self._notebook_model(path, content=content, sha256=sha256)
else:
if type == "directory":
raise web.HTTPError(400, "%s is not a directory" % path, reason="bad type")
model = await self._file_model(path, content=content, format=format, md5=md5)
model = await self._file_model(path, content=content, format=format, sha256=sha256)
self.emit(data={"action": "get", "path": path})
return model

Expand Down