Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve trailing semicolon for Notebooks #8590

Merged
merged 1 commit into from Nov 10, 2023
Merged

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Nov 9, 2023

Summary

This PR updates the formatter to preserve trailing semicolon for Jupyter Notebooks.

The motivation behind the change is that semicolons in notebooks are typically used to hide the output, for example when plotting. This is highlighted in the linked issue.

The conditions required as to when the trailing semicolon should be preserved are:

  1. It should be a top-level statement which is last in the module.
  2. For statement, it can be either assignment, annotated assignment, or augmented assignment. Here, the target should only be a single identifier i.e., multiple assignments or tuple unpacking isn't considered.
  3. For expression, it can be any.

Test Plan

Add a new integration test in ruff_cli. The test notebook basically acts as a document as to which trailing semicolons are to be preserved.

fixes: #8254

@dhruvmanila
Copy link
Member Author

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

Copy link
Contributor

github-actions bot commented Nov 9, 2023

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dhruvmanila dhruvmanila added the formatter Related to the formatter label Nov 10, 2023
@dhruvmanila dhruvmanila marked this pull request as ready for review November 10, 2023 09:23
Comment on lines +817 to +821
#[test]
fn test_notebook_trailing_semicolon() {
let fixtures = Path::new("resources").join("test").join("fixtures");
let unformatted = fs::read(fixtures.join("trailing_semicolon.ipynb")).unwrap();
assert_cmd_snapshot!(Command::new(get_cargo_bin(BIN_NAME))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the correct place for this test case. We would need to refactor the formatter test suite to include notebook files.

Comment on lines -53 to +64
SuiteKind::TopLevel => NodeLevel::TopLevel,
SuiteKind::TopLevel => NodeLevel::TopLevel(
iter.clone()
.next()
.map_or(TopLevelStatementPosition::Last, |_| {
TopLevelStatementPosition::Other
}),
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid cloning the iterator by making it peekable but that involves updating the function signature wherever this iterator is being passed. This is because it expects std::slice::Iter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clone is effectively free (we just clone the slice reference), so this is fine. The peakable things with the function signatures is annoying though, i had also already hit that

Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it simplify the implementation to insert the semicolon at

following.format().fmt(f)?;

instead? Then we shouldn't need TopLevelStatementPosition

Comment on lines -53 to +64
SuiteKind::TopLevel => NodeLevel::TopLevel,
SuiteKind::TopLevel => NodeLevel::TopLevel(
iter.clone()
.next()
.map_or(TopLevelStatementPosition::Last, |_| {
TopLevelStatementPosition::Other
}),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clone is effectively free (we just clone the slice reference), so this is fine. The peakable things with the function signatures is annoying though, i had also already hit that

@dhruvmanila
Copy link
Member Author

Would it simplify the implementation to insert the semicolon at

following.format().fmt(f)?;

instead? Then we shouldn't need TopLevelStatementPosition

I'm assuming what you're recommending is to check at the linked position if this is the last statement and then preserve the semicolon, if any. Is this correct?

I think that'll involve matching against the relevant nodes where the semicolon needs to be preserved as it's only required for a subset of nodes - StmtAssign, StmtAugAssign, StmtAnnAssign, and StmtExpr. Additionally, certain conditions needs to be checked like is this a multiple assignment or not? I'd prefer to keep the logic in the respective node.

@dhruvmanila dhruvmanila merged commit 3e00ddc into main Nov 10, 2023
17 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/notebook-semicolon branch November 10, 2023 16:23
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to avoid using the context whenever possible because it is harder to reason about where the context value is set/changed.

An alternative to the context would have been to parameterized FormatStmt, FormatStmtAssign etc (by implementing FormatWithOptions) and explicitly pass a value whether it is the last statement.

Comment on lines +58 to +64
SuiteKind::TopLevel => NodeLevel::TopLevel(
iter.clone()
.next()
.map_or(TopLevelStatementPosition::Last, |_| {
TopLevelStatementPosition::Other
}),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified to

SuiteKind::TopLevel => NodeLevel::TopLevel(if statements.len() < 2 {
    TopLevelStatementPosition::Last
} else {
    TopLevelStatementPosition::Other
}),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can be done.

@dhruvmanila
Copy link
Member Author

An alternative to the context would have been to parameterized FormatStmt, FormatStmtAssign etc (by implementing FormatWithOptions) and explicitly pass a value whether it is the last statement.

I thought about this but I felt like this kind of information is suited well for the context. And, it already provided the node level information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

formatter(Jupyter): ending semicolons valid in Jupyter
3 participants