Support formatting jupyter notebooks #1218

konstin · 2022-12-12T17:15:11Z

black supports formatting jupyter notebook with the jupyter extra black[jupyter]. If installed like this, it formats jupyter notebooks just like python files. It would be nice if ruff could similarly lint and fix jupyter notebooks.

Steps to implement this:

Read Jupyter notebooks, build a mapping between cells and concatenated code and and print lint errors with cell ids. The code is in jupyter/notebooks.rs.
Apply fixes to concatenated code and map back to cells
Write back jupyter notebooks ensuring that if there are no changes the json is identical (roundtrip support). You might need to extend or replace the current schema or drop some of the schema for just serde_json::Value. See also the black implementation.
Test roundtrip support with a bunch of different notebooks found in the wild, ideally from different generators (e.g. jupyter notebook and pycharm) and different schema versions.
Remove jupyter_notebook feature and include .ipynb files in ruff by default
Optional: Check if there a any lints that should be off for jupyter notebook, e.g. "no print" or isort rules might not make sense

The text was updated successfully, but these errors were encountered:

charliermarsh · 2022-12-26T04:02:11Z

I assume Ruff would work with nbQA, but I too would like it to support Jupyter out-of-the-box. Looking into it...

charliermarsh · 2022-12-28T02:18:47Z

Ruff was actually integrated into nbQA in the latest release, so you can now run (e.g.):

❯ nbqa ruff Untitled.ipynb
Untitled.ipynb:cell_1:2:5: F841 Local variable `x` is assigned to but never used
Untitled.ipynb:cell_2:1:1: E402 Module level import not at top of file
Untitled.ipynb:cell_2:1:8: F401 `os` imported but unused
Found 3 error(s).
1 potentially fixable with the --fix option.

(Would still like to have a first-party integration at some point.)

blakeNaccarato · 2023-03-10T03:21:21Z

See a parallel effort over in the Ruff VSCode extension, where LSP support for analyzing entire Jupyter notebooks seems blocked by pygls still being on LSP v3.16. It could be a shorter road for the Ruff LSP implementation to handle entire notebook formatting, if effort can be concentrated over at pygls to get LSP 3.17 out. See my comment in the other thread for more detail and other Issue links.

It would be nice to have native support in Ruff as well! But in the short run, it looks like LSP support may have less resistance.

konstin · 2023-05-11T11:58:59Z

The current implementation can lint jupyter files, but it can't apply fixes yet or write them back to file. I've written instructions in the top comment and can mentor if someone wants to tackle this.

sladyn98 · 2023-05-16T21:25:26Z

@konstin I could take this on

charliermarsh · 2023-05-17T01:19:38Z

@sladyn98 - Sorry for the churn, I already chatted with @dhruvmanila about taking this one on!

dhruvmanila · 2023-05-17T03:01:37Z

@charliermarsh you can assign this to me to avoid any future confusion :)

## Summary Add support for applying auto-fixes in Jupyter Notebook. ### Solution Cell offsets are the boundaries for each cell in the concatenated source code. They are represented using `TextSize`. It includes the start and end offset as well, thus creating a range for each cell. These offsets are updated using the `SourceMap` markers. ### SourceMap `SourceMap` contains markers constructed from each edits which tracks the original source code position to the transformed positions. The following drawing might make it clear: ![SourceMap visualization](https://github.com/astral-sh/ruff/assets/67177269/3c94e591-70a7-4b57-bd32-0baa91cc7858) The center column where the dotted lines are present are the markers included in the `SourceMap`. The `Notebook` looks at these markers and updates the cell offsets after each linter loop. If you notice closely, the destination takes into account all of the markers before it. The index is constructed only when required as it's only used to render the diagnostics. So, a `OnceCell` is used for this purpose. The cell offsets, cell content and the index will be updated after each iteration of linting in the mentioned order. The order is important here as the content is updated as per the new offsets and index is updated as per the new content. ## Limitations ### 1 Styling rules such as the ones in `pycodestyle` will not be applicable everywhere in Jupyter notebook, especially at the cell boundaries. Let's take an example where a rule suggests to have 2 blank lines before a function and the cells contains the following code: ```python import something # --- def first(): pass def second(): pass ``` (Again, the comment is only to visualize cell boundaries.) In the concatenated source code, the 2 blank lines will be added but it shouldn't actually be added when we look in terms of Jupyter notebook. It's as if the function `first` is at the start of a file. `nbqa` solves this by recording newlines before and after running `autopep8`, then running the tool and restoring the newlines at the end (refer nbQA-dev/nbQA#807). ## Test Plan Three commands were run in order with common flags (`--select=ALL --no-cache --isolated`) to isolate which stage the problem is occurring: 1. Only diagnostics 2. Fix with diff (`--fix --diff`) 3. Fix (`--fix`) ### https://github.com/facebookresearch/segment-anything ``` ------------------------------------------------------------------------------- Jupyter Notebooks 3 0 0 0 0 |- Markdown 3 98 0 94 4 |- Python 3 513 468 4 41 (Total) 611 468 98 45 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix ... Found 180 errors (89 fixed, 91 remaining). ``` ### https://github.com/openai/openai-cookbook ``` ------------------------------------------------------------------------------- Jupyter Notebooks 65 0 0 0 0 |- Markdown 64 3475 12 2507 956 |- Python 65 9700 7362 1101 1237 (Total) 13175 7374 3608 2193 =============================================================================== ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-' ... Found 4227 errors (2165 fixed, 2062 remaining). ``` ### https://github.com/tensorflow/docs ``` ------------------------------------------------------------------------------- Jupyter Notebooks 150 0 0 0 0 |- Markdown 1 55 0 46 9 |- Python 1 402 289 60 53 (Total) 457 289 106 62 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent ... Found 12726 errors (5140 fixed, 7586 remaining). ``` ### https://github.com/tensorflow/models ``` ------------------------------------------------------------------------------- Jupyter Notebooks 46 0 0 0 0 |- Markdown 1 11 0 6 5 |- Python 1 328 249 19 60 (Total) 339 249 25 65 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix ... Found 4856 errors (2690 fixed, 2166 remaining). ``` resolves: #1218 fixes: #4556

dhruvmanila · 2023-06-12T15:43:48Z

Sorry, not yet completed.

charliermarsh · 2023-06-12T15:47:19Z

What's the outstanding work here? Testing?

dhruvmanila · 2023-06-12T16:06:09Z

Roundtrip support
End-to-end testing support

## Summary Add roundtrip support for Jupyter notebook. 1. Read the notebook 2. Extract out the source code content 3. Use it to update the notebook itself (should be exactly the same [^1]) 4. Serialize into JSON and print it to stdout ## Test Plan `cargo run --all-features --bin ruff_dev --package ruff_dev -- round-trip <path/to/notebook.ipynb>` <details><summary>Example output:</summary> <p> ``` { "cells": [ { "cell_type": "markdown", "id": "f3c286e9-fa52-4440-816f-4449232f199a", "metadata": {}, "source": [ "# Ruff Test" ] }, { "cell_type": "markdown", "id": "a2b7bc6c-778a-4b07-86ae-dde5a2d9511e", "metadata": {}, "source": [ "Markdown block before the first import" ] }, { "cell_type": "code", "id": "5e3ef98e-224c-450a-80e6-be442ad50907", "metadata": { "tags": [] }, "source": "", "execution_count": 1, "outputs": [] }, { "cell_type": "code", "id": "6bced3f8-e0a4-450c-ae7c-f60ad5671ee9", "metadata": {}, "source": "import contextlib\n\nwith contextlib.suppress(ValueError):\n print()\n", "outputs": [] }, { "cell_type": "code", "id": "d7102cfd-5bb5-4f5b-a3b8-07a7b8cca34c", "metadata": {}, "source": "import random\n\nrandom.randint(10, 20)", "outputs": [] }, { "cell_type": "code", "id": "88471d1c-7429-4967-898f-b0088fcb4c53", "metadata": {}, "source": "foo = 1\nif foo < 2:\n msg = f\"Invalid foo: {foo}\"\n raise ValueError(msg)", "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "Python (ruff-playground)", "name": "ruff-playground", "language": "python" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 } ``` </p> </details> [^1]: The type in JSON might be different (#4665 (comment)) Part of #1218

## Summary Add support for applying auto-fixes in Jupyter Notebook. ### Solution Cell offsets are the boundaries for each cell in the concatenated source code. They are represented using `TextSize`. It includes the start and end offset as well, thus creating a range for each cell. These offsets are updated using the `SourceMap` markers. ### SourceMap `SourceMap` contains markers constructed from each edits which tracks the original source code position to the transformed positions. The following drawing might make it clear: ![SourceMap visualization](https://github.com/astral-sh/ruff/assets/67177269/3c94e591-70a7-4b57-bd32-0baa91cc7858) The center column where the dotted lines are present are the markers included in the `SourceMap`. The `Notebook` looks at these markers and updates the cell offsets after each linter loop. If you notice closely, the destination takes into account all of the markers before it. The index is constructed only when required as it's only used to render the diagnostics. So, a `OnceCell` is used for this purpose. The cell offsets, cell content and the index will be updated after each iteration of linting in the mentioned order. The order is important here as the content is updated as per the new offsets and index is updated as per the new content. ## Limitations ### 1 Styling rules such as the ones in `pycodestyle` will not be applicable everywhere in Jupyter notebook, especially at the cell boundaries. Let's take an example where a rule suggests to have 2 blank lines before a function and the cells contains the following code: ```python import something # --- def first(): pass def second(): pass ``` (Again, the comment is only to visualize cell boundaries.) In the concatenated source code, the 2 blank lines will be added but it shouldn't actually be added when we look in terms of Jupyter notebook. It's as if the function `first` is at the start of a file. `nbqa` solves this by recording newlines before and after running `autopep8`, then running the tool and restoring the newlines at the end (refer nbQA-dev/nbQA#807). ## Test Plan Three commands were run in order with common flags (`--select=ALL --no-cache --isolated`) to isolate which stage the problem is occurring: 1. Only diagnostics 2. Fix with diff (`--fix --diff`) 3. Fix (`--fix`) ### https://github.com/facebookresearch/segment-anything ``` ------------------------------------------------------------------------------- Jupyter Notebooks 3 0 0 0 0 |- Markdown 3 98 0 94 4 |- Python 3 513 468 4 41 (Total) 611 468 98 45 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix ... Found 180 errors (89 fixed, 91 remaining). ``` ### https://github.com/openai/openai-cookbook ``` ------------------------------------------------------------------------------- Jupyter Notebooks 65 0 0 0 0 |- Markdown 64 3475 12 2507 956 |- Python 65 9700 7362 1101 1237 (Total) 13175 7374 3608 2193 =============================================================================== ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-' ... Found 4227 errors (2165 fixed, 2062 remaining). ``` ### https://github.com/tensorflow/docs ``` ------------------------------------------------------------------------------- Jupyter Notebooks 150 0 0 0 0 |- Markdown 1 55 0 46 9 |- Python 1 402 289 60 53 (Total) 457 289 106 62 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent ... Found 12726 errors (5140 fixed, 7586 remaining). ``` ### https://github.com/tensorflow/models ``` ------------------------------------------------------------------------------- Jupyter Notebooks 46 0 0 0 0 |- Markdown 1 11 0 6 5 |- Python 1 328 249 19 60 (Total) 339 249 25 65 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix ... Found 4856 errors (2690 fixed, 2166 remaining). ``` resolves: #1218 fixes: #4556

## Summary Add roundtrip support for Jupyter notebook. 1. Read the notebook 2. Extract out the source code content 3. Use it to update the notebook itself (should be exactly the same [^1]) 4. Serialize into JSON and print it to stdout ## Test Plan `cargo run --all-features --bin ruff_dev --package ruff_dev -- round-trip <path/to/notebook.ipynb>` <details><summary>Example output:</summary> <p> ``` { "cells": [ { "cell_type": "markdown", "id": "f3c286e9-fa52-4440-816f-4449232f199a", "metadata": {}, "source": [ "# Ruff Test" ] }, { "cell_type": "markdown", "id": "a2b7bc6c-778a-4b07-86ae-dde5a2d9511e", "metadata": {}, "source": [ "Markdown block before the first import" ] }, { "cell_type": "code", "id": "5e3ef98e-224c-450a-80e6-be442ad50907", "metadata": { "tags": [] }, "source": "", "execution_count": 1, "outputs": [] }, { "cell_type": "code", "id": "6bced3f8-e0a4-450c-ae7c-f60ad5671ee9", "metadata": {}, "source": "import contextlib\n\nwith contextlib.suppress(ValueError):\n print()\n", "outputs": [] }, { "cell_type": "code", "id": "d7102cfd-5bb5-4f5b-a3b8-07a7b8cca34c", "metadata": {}, "source": "import random\n\nrandom.randint(10, 20)", "outputs": [] }, { "cell_type": "code", "id": "88471d1c-7429-4967-898f-b0088fcb4c53", "metadata": {}, "source": "foo = 1\nif foo < 2:\n msg = f\"Invalid foo: {foo}\"\n raise ValueError(msg)", "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "Python (ruff-playground)", "name": "ruff-playground", "language": "python" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 } ``` </p> </details> [^1]: The type in JSON might be different (#4665 (comment)) Part of #1218

## Summary Ability to perform integration test on Jupyter notebooks Part of #1218 ## Test Plan `cargo test`

dhruvmanila · 2023-06-20T05:43:57Z

Meta issue: #5188

charliermarsh added the enhancement label Dec 12, 2022

charliermarsh mentioned this issue Dec 28, 2022

Add nbQA support to the docs #1417

Merged

charliermarsh added core Related to core functionality and removed enhancement labels Dec 31, 2022

konstin added the help wanted Contributions especially welcome label May 12, 2023

charliermarsh assigned dhruvmanila May 17, 2023

This was referenced May 26, 2023

Add support for auto-fix in Jupyter notebooks #4665

Merged

Add --diff support for Jupyter notebooks #4727

Closed

dhruvmanila closed this as completed in #4665 Jun 12, 2023

dhruvmanila reopened this Jun 12, 2023

dhruvmanila mentioned this issue Jun 12, 2023

Add roundtrip support for Jupyter notebook #5028

Merged

dhruvmanila mentioned this issue Jun 14, 2023

Ability to perform integration test on Jupyter notebooks #5076

Merged

dhruvmanila added a commit that referenced this issue Jun 15, 2023

Ability to perform integration test on Jupyter notebooks (#5076)

097823b

## Summary Ability to perform integration test on Jupyter notebooks Part of #1218 ## Test Plan `cargo test`

dhruvmanila closed this as completed Jun 20, 2023

dhruvmanila mentioned this issue Jul 26, 2023

Complete Jupyter notebook integration #5188

Closed

26 tasks

Borda mentioned this issue Apr 18, 2024

lint: replace pyupgrade with Ruff's rule UP unit8co/darts#2340

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support formatting jupyter notebooks #1218

Support formatting jupyter notebooks #1218

konstin commented Dec 12, 2022 •

edited by dhruvmanila

charliermarsh commented Dec 26, 2022

charliermarsh commented Dec 28, 2022

blakeNaccarato commented Mar 10, 2023 •

edited

konstin commented May 11, 2023

sladyn98 commented May 16, 2023

charliermarsh commented May 17, 2023

dhruvmanila commented May 17, 2023

dhruvmanila commented Jun 12, 2023

charliermarsh commented Jun 12, 2023

dhruvmanila commented Jun 12, 2023

dhruvmanila commented Jun 20, 2023

Support formatting jupyter notebooks #1218

Support formatting jupyter notebooks #1218

Comments

konstin commented Dec 12, 2022 • edited by dhruvmanila

charliermarsh commented Dec 26, 2022

charliermarsh commented Dec 28, 2022

blakeNaccarato commented Mar 10, 2023 • edited

konstin commented May 11, 2023

sladyn98 commented May 16, 2023

charliermarsh commented May 17, 2023

dhruvmanila commented May 17, 2023

dhruvmanila commented Jun 12, 2023

charliermarsh commented Jun 12, 2023

dhruvmanila commented Jun 12, 2023

dhruvmanila commented Jun 20, 2023

konstin commented Dec 12, 2022 •

edited by dhruvmanila

blakeNaccarato commented Mar 10, 2023 •

edited