Improve multiline string handling #1879

aneeshusa · 2020-12-22T19:40:06Z

Description

Improve formatting of multiline strings, especially in function calls,
by updating black to look at the context around multiline strings
to decide if they should be inlined or split to a separate line.
Currently behind the --preview flag.

Performance tested the new functionality in #1879 (comment).
Fixes #256.

Checklist - did you ...

Add an entry in CHANGES.md if necessary?
Add / update tests if necessary?
Add new / update outdated documentation?

aneeshusa · 2020-12-22T19:41:14Z

This is definitely not ready to be merged yet (see TODOs above), but have a prototype that seems to work and wanted to share so I could start getting feedback. Main things I would want feedback on are a) if all the examples in the test suite mesh with the desired code styling for black and b) any high level advice on how to better integrate the new code with existing code.

aneeshusa · 2020-12-22T19:44:17Z

Also - let me know if there are any edge cases we want to say are too unlikely to occur in the wild that don't need to be covered, e.g. the multiline string as default value to function test case. I did use sourcegraph.com/search to find uses of multiline strings in the wild and added a few test cases based on that.

JelleZijlstra

Thanks for this! Here's some initial feedback:

I also looked at some of the black-primer output and most of it looks like the new formatting is better. You don't need to submit PRs to these projects; they're expected to update themselves as Black releases a new version. You can update the primer config to mark these projects as having expected formatting changes.
I don't think there are any edge cases that are too unlikely for Black to handle, since users run Black on probably pretty much any conceivable Python construct. It's OK to spend less effort on more obscure syntax as long as it doesn't cause crashes.

JelleZijlstra · 2020-12-23T00:35:30Z

tests/data/multiline_strings.py

+    **kwargs,
+):
+    pass
+# output


I'd prefer more newlines around the # output comment so it's easier to find

Suggested change

# output

# output

JelleZijlstra · 2020-12-23T00:38:53Z

tests/data/multiline_strings.py

+)
+call(
+    3,
+    textwrap.dedent("""cow


I feel like it'd be better to put the argument to dedent() on a separate line here, so it's easier to count the arguments to call(). But there's definitely some cases below where it makes more sense not to put a single multiline string argument on separate lines, so I'm open to persuasion otherwise.

This change is probably to fix Black's handling of the following code:

textwrap.dedent("""\ Hello, I am a multiline string used with a common idiom """)

Right now Black transforms it to:

textwrap.dedent( """\ Hello, I am a multiline string used with a common idiom """ )

And this PR causes Black leave the code untouched.

Although I do agree that it would look better to have the multiline string argument on a separate line when there's more than one argument in the call. But then again, I don't know if that's even possible with our current Visitor design.

edit: if the argument-count dependent formatting is dumb or impossible, consider me +0.5 for keeping PR behaviour.

JelleZijlstra · 2020-12-23T00:39:12Z

tests/data/composition_no_trailing_comma.py

-        """ % (
-            _C.__init__.__code__.co_firstlineno + 1,
-        )
+        """ % (_C.__init__.__code__.co_firstlineno + 1,)


I like this change

ichard26

Thanks for working on this! Overall I do like the changes this introduces. Hopefully it's not too hard to explain these changes in word form that's understandable by Black's users.

Also, I would like to say sorry ahead of time for the terrible documentation workflow we have (it's mostly my fault). I really need to improve it and make it less painful to add/modify documentation but I'm slow and lazy so yeah.

I'm not qualified to review the actual formatting code, but I did notice a slight deficiency with your test code.

ichard26 · 2020-12-25T02:27:46Z

tests/test_black.py

+    @patch("black.dump_to_file", dump_to_stderr)
+    def test_multiline_strings(self) -> None:
+        source, expected = read_data("multiline_strings")
+        actual = fs(source)
+        self.assertFormatEqual(expected, actual)
+        black.assert_equivalent(source, actual)
+        black.assert_stable(source, actual, DEFAULT_MODE)
+


Since PR #1785, writing simple tests like this one is quite easier. Just make the test data and add its normalized name (i.e. strip the .py suffix) in the SIMPLE cases list in tests/test_format.py.

I've done this for you in a PR against your branch since I can't suggest changes on lines/files you didn't modify yet I need to suggest a single addition in such a file :/

black currently has poor multi-line string treatment for dedent()-ed code. I've ran aneeshusa's black branch psf/black#1879 on ropetest instead, which leaves dedent()-ed lines alone; however most people likely will be running mainline black which would have mucked these formatting , so we're adding an exclusion rule in pyproject.toml prevent people from auto-formatting ropetest.

Also add pyproject.toml to avoid re-running black on ropetest black currently has poor multi-line string treatment for dedent()-ed code. I've ran aneeshusa's black branch psf/black#1879 on ropetest instead, which leaves dedent()-ed lines alone; however most people likely will be running mainline black which would have mucked these formatting , so we're adding an exclusion rule in pyproject.toml prevent people from auto-formatting ropetest.

black currently has poor multi-line string treatment for dedent()-ed code. I've ran aneeshusa's 'black' branch psf/black#1879 on ropetest instead, which leaves dedent()-ed lines alone while doing all its other cleanups. However most people likely will be running mainline black which would have mucked the formatting in these files, so I've also added an exclusion rule in pyproject.toml to prevent people from accidentally auto-formatting ropetest again. Until aneeshusa's branch are merged into mainline black, or black has a proper solution for dedent()-ed code, be careful of running black on ropetest.

JelleZijlstra · 2021-11-14T03:57:27Z

This has some conflicts now. With our new stability policy in place, this would be a good candidate to go into the "unstable" flag for a year to see it mature.

lieryan · 2022-07-18T06:12:49Z

Just to provide some feedback, I've been using this PR's branch for the past year, and it does very well. Much better than default black's behavior when it comes to dedent(). Hadn't encountered any issues.

olivia-hong · 2022-10-26T21:24:27Z

Hello, finally reviving this PR!
I cleaned up it up to work with the latest main branch,
performance tested,  and moved the logic under the --preview flag.
 Still working on adding docs/updating comments but wanted to get some review on the PR in the meantime.

I did some benchmarking to see if the current logic would be safe to merge as-is,
and data indicates that there's no impact on performance
(times were roughly the same across all repos tested on).
Given that, would appreciate any advice from maintainers on the existing approach.

Performance Results

I used `diff-shades` and `pre-commit`
Ran on a variety of repos (large, small, many changes, little/no changes) as well as a bunch of Lyft-internal repos.

Some info:

Run 1 actually applies formatting changes from the multiline PR (if applicable)
If Run 1 is marked N/A, this means no files were changed when applying the multiline PR. 
Run 2 is the “steady-state” since it occurs after formatting changes have already been made.
I executed Run 2 with and without the cache by manually deleting it using
```
rm -rf ~/Library/Caches/black
```

Running black via pre-commit

Manually edited the `rev` in `.pre-commit-config.yaml` and ran `pre-commit run black -a -v`

Repo	v22.10.0	v22.10.0 w/ cache	Multiline (Run 1)	Multiline (Run 2) w/o cache	Multiline (Run 2) w/ cache	Files Changed	# of Changes
django	10.62	1.02	11.1	10.14	1	28	471
pandas	11.73	0.63	14.63	11.47	0.61	53	1923
sqlalchemy	8.78	0.35	9.98	8.9	0.33	14	234
pyramid	1.89	0.4	N/A	1.94	0.37	3	46
pytest	1.85	0.21	3.82	1.91	0.2	64	9505
tox	1.13	0.17	2.06	1.12	0.18	7	39
typeshed	6.59	1.19	6.58	6.36	1.19	2	24
virtualenv	0.8	0.19	N/A	0.85	0.21	0	0
flake8-bugbear	0.81	0.15	N/A	0.94	0.15	0	0
opencv	1.86	0.24	4.34	1.78	0.26	13	0

diff-shades

Repo	v22.1.0	Multiline	Files Changed	# of Changes
django	7	8	28	471
pandas	11	13	53	1923
sqlalchemy	8	9	14	234
pyramid	1	1	3	46
pytest	1	3	64	9505
tox	0	1	7	39
typeshed	4	4	2	24
virtualenv	0	0	0	0
flake8-bugbear	0	0	0	0
opencv	N/A	N/A	13	0

olivia-hong · 2022-11-03T19:13:14Z

Quick update, I added docs and updated the PR description so this is fully ready for review

cooperlees

Looks good to me and well tested + documented.

Can we maybe add a f""" multiline string just to ensure it works / no regressions there in future please. I know I use them and sure many others do.

olivia-hong · 2022-12-22T18:48:19Z

Can we maybe add a f""" multiline string just to ensure it works / no regressions there in future please. I know I use them and sure many others do.

Thank you for the review @cooperlees! I added multiline f-string test cases and fixed up the merge conflicts

cooperlees

Thanks! Functionality and test wise this all seems good to me. There is some deep code in lines.py that I don't get so would want one of the AST smart maintainers to be happy with before final merge.

github-actions · 2022-12-23T16:51:46Z

diff-shades results comparing this PR (b2f7637) to main (9c8464c). The full diff is available in the logs under the "Generate HTML diff report" step.

╭─────────────────────────── Summary ────────────────────────────╮
│ 15 projects & 243 files changed / 13 792 changes [+4410/-9382] │
│                                                                │
│ ... out of 2 400 287 lines, 11 495 files & 23 projects         │
╰────────────────────────────────────────────────────────────────╯

Differences found.

What is this? | Workflow run | diff-shades documentation

felix-hilden

Hi! Thanks for the work and sorry for the late review on my part as well. A couple of small comments and a gentle wish below if you still have energy for this PR 😅

I'd adore it if you could explain the algorithm a bit perhaps in a comment, or even try to break it into more bite-sized pieces so that the big picture is clearer. Like finding the leaves that are inside the line (L783-L788). Took me a solid hour to start to understand, particularly the manipulation of max_level_to_update 😄 (although in fairness it's not the most comfortable part of the code base for me to begin with).

Anyways, thank you for taking this on 🙏

tests/data/preview/multiline_strings.py

src/black/lines.py

olivia-hong · 2023-02-08T01:46:23Z

@felix-hilden Thank you very much for the review! Sorry for the delay; I've added the test cases mentioned, dedented, and added more comments including a top-level comment for the algorithm.

felix-hilden

Thank you very much for improving this still, LGTM!

aneeshusa · 2023-03-07T19:49:00Z

Thank you to everyone who has reviewed, we appreciate the feedback.
@JelleZijlstra, @cooperlees, @ichard26 with all comments addressed and 3 approvals, are we ready to merge the PR?

danielruc91 · 2023-03-14T02:48:19Z

I think this commit has not resolve the case that the first parameter to a function is a multiline string, and the function has multiple arguments.

for example:

def _print(line, f):
    print(line, file=f)

_print(f"""
some
multiline string
""", some_file
)

is still formatted to:

def _print(line, f):
    print(line, file=f)


_print(
    f"""
some
multiline string
""",
    some_file,
)

FichteFoll · 2023-03-14T09:42:29Z

Yes, multiple arguments are a different situation where this PR only discussed and solved the case for one argument (or similarly nested structures with only one child each). Two arguments being split over multiple lines is the intended behavior, as far as I'm concerned.

aneeshusa added 2 commits December 22, 2020 09:26

Remove unused function remove_trailing_comma

697f938

Improve multiline string handling

8e412bf

aneeshusa mentioned this pull request Dec 22, 2020

Unnecessary line breaks in method call on multiline string #256

Closed

JelleZijlstra reviewed Dec 23, 2020

View reviewed changes

ichard26 reviewed Dec 25, 2020

View reviewed changes

ichard26 added the F: strings Related to our handling of strings label Jul 16, 2021

lieryan mentioned this pull request Sep 26, 2021

Blacken source code python-rope/rope#404

Merged

ichard26 self-assigned this Apr 9, 2022

ichard26 added S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable. help wanted Extra attention is needed labels Aug 3, 2022

olivia-hong added 4 commits October 26, 2022 11:49

rebase to current main, fix some lint issues

30847e3

move logic under preview

00fb929

undo random newline removal

ed3dab0

fix/add a few comments

b46e997

olivia-hong and others added 4 commits October 31, 2022 12:33

fix self lint

a6f9f35

Merge branch 'main' into improve-multiline-string-handling

bc8dfe1

add documentation

f8ddbfb

also update comment

c20506d

lieryan mentioned this pull request Nov 28, 2022

PR for #551: fix minor flake8 complaints python-rope/rope#552

Merged

14 tasks

olivia-hong mentioned this pull request Dec 16, 2022

Setting the 2023 stable style #3407

Closed

cooperlees requested changes Dec 17, 2022

View reviewed changes

olivia-hong and others added 3 commits December 21, 2022 19:23

f-string test case

c1d7679

fix merge conflicts

ae12562

fix changes.md merge

9ce0e9b

cooperlees approved these changes Dec 23, 2022

View reviewed changes

JelleZijlstra approved these changes Dec 29, 2022

View reviewed changes

felix-hilden reviewed Jan 13, 2023

View reviewed changes

tests/data/preview/multiline_strings.py Show resolved Hide resolved

tests/data/preview/multiline_strings.py Show resolved Hide resolved

src/black/lines.py Outdated Show resolved Hide resolved

olivia-hong and others added 4 commits February 7, 2023 19:52

address review (dedent, more test cases, add more comments)

a6a6827

Merge branch 'main' into improve-multiline-string-handling

b49b424

tweak comments more

acb54c3

punctuation

909580c

add missing import

b2f7637

felix-hilden approved these changes Feb 8, 2023

View reviewed changes

Sheile mentioned this pull request Feb 14, 2023

replace create.py to install ckan command create table c-3lab/ckanext-feedback#6

Merged

JelleZijlstra merged commit 4a063a9 into psf:main Mar 7, 2023

lieryan mentioned this pull request Mar 9, 2023

Use upstream black python-rope/rope#684

Open

2 tasks

dahlia mentioned this pull request Mar 16, 2023

Let string splitters respect East_Asian_Width property #3445

Merged

3 tasks

Jackenmen mentioned this pull request Mar 24, 2023

Less whitespace when a function call contains a single list/dict/tuple/string #3621

Open

charliermarsh mentioned this pull request Aug 15, 2023

Formatter: Expands function call after multi-line string astral-sh/ruff#6500

Closed

hauntsaninja mentioned this pull request Nov 13, 2023

Setting the 2024 stable style #4042

Closed

konstin mentioned this pull request Nov 14, 2023

🏖️ Black 2024 Preview Style astral-sh/ruff#8678

Closed

28 tasks

MichaReiser mentioned this pull request Nov 29, 2023

Formatter: multiline_string_handling preview style astral-sh/ruff#8896

Closed

JelleZijlstra mentioned this pull request Jan 20, 2024

multiline_string_handling sometimes leads to worse formatting #4159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve multiline string handling #1879

Improve multiline string handling #1879

aneeshusa commented Dec 22, 2020 •

edited

aneeshusa commented Dec 22, 2020

aneeshusa commented Dec 22, 2020

JelleZijlstra left a comment

JelleZijlstra Dec 23, 2020

JelleZijlstra Dec 23, 2020

ichard26 Dec 23, 2020 •

edited

JelleZijlstra Dec 23, 2020

ichard26 left a comment •

edited

ichard26 Dec 25, 2020

JelleZijlstra commented Nov 14, 2021

lieryan commented Jul 18, 2022

olivia-hong commented Oct 26, 2022

Running black via pre-commit

diff-shades

olivia-hong commented Nov 3, 2022

cooperlees left a comment

olivia-hong commented Dec 22, 2022

cooperlees left a comment

github-actions bot commented Dec 23, 2022 •

edited

felix-hilden left a comment

olivia-hong commented Feb 8, 2023

felix-hilden left a comment

aneeshusa commented Mar 7, 2023

danielruc91 commented Mar 14, 2023

FichteFoll commented Mar 14, 2023

Improve multiline string handling #1879

Improve multiline string handling #1879

Conversation

aneeshusa commented Dec 22, 2020 • edited

Description

Checklist - did you ...

aneeshusa commented Dec 22, 2020

aneeshusa commented Dec 22, 2020

JelleZijlstra left a comment

Choose a reason for hiding this comment

JelleZijlstra Dec 23, 2020

Choose a reason for hiding this comment

JelleZijlstra Dec 23, 2020

Choose a reason for hiding this comment

ichard26 Dec 23, 2020 • edited

Choose a reason for hiding this comment

JelleZijlstra Dec 23, 2020

Choose a reason for hiding this comment

ichard26 left a comment • edited

Choose a reason for hiding this comment

ichard26 Dec 25, 2020

Choose a reason for hiding this comment

JelleZijlstra commented Nov 14, 2021

lieryan commented Jul 18, 2022

olivia-hong commented Oct 26, 2022

Running black via pre-commit

diff-shades

olivia-hong commented Nov 3, 2022

cooperlees left a comment

Choose a reason for hiding this comment

olivia-hong commented Dec 22, 2022

cooperlees left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 23, 2022 • edited

felix-hilden left a comment

Choose a reason for hiding this comment

olivia-hong commented Feb 8, 2023

felix-hilden left a comment

Choose a reason for hiding this comment

aneeshusa commented Mar 7, 2023

danielruc91 commented Mar 14, 2023

FichteFoll commented Mar 14, 2023

aneeshusa commented Dec 22, 2020 •

edited

ichard26 Dec 23, 2020 •

edited

ichard26 left a comment •

edited

github-actions bot commented Dec 23, 2022 •

edited