Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer splitting expressions with many child expressions. #1811

Closed
torotil opened this issue Nov 9, 2020 · 34 comments
Closed

Prefer splitting expressions with many child expressions. #1811

torotil opened this issue Nov 9, 2020 · 34 comments
Labels
F: linebreak How should we split up lines? T: style What do we want Blackened code to look like?

Comments

@torotil
Copy link

torotil commented Nov 9, 2020

Describe the style change

Currently black splits lines from the outermost expression to the innermost expression. This makes sense in most cases, but there is some cases where preferring the inner expression is more semantic. This is especially true if the only argument of a function is a dictionary.

Examples in the current Black style

f(
    {
        "foo": "This is a dict",
        "bar": "with values that are too long",
        "baz": "to fit on a line.",
    }
)

Desired style

f({
   "foo": "This is a dict",
   "bar": "with values that are too long",
   "baz": "to fit on a line.",
})

In this example the first break (putting the dict on it’s own line) doesn’t even make the lines any shorter.

Additional context

It’s hard to make a general rule out of this. Perhaps splitting expressions first that have many child-expresssions would be a reasonable heuristic, but I’m happy for suggestions.

@torotil torotil added the T: style What do we want Blackened code to look like? label Nov 9, 2020
@klahnakoski
Copy link

Similar issue here: #626

I made an experimental fork to solve this problem. It is a hack, and I am not actively maintaining it.

@codethief
Copy link

codethief commented Dec 19, 2020

I don't think #626 is that similar. The situation presented by @torotil is different: There is precisely one argument being passed to the function.

My proposal would therefore be to inline the list or dictionary brackets when the list / dictionary is the sole argument of a function (whether a positional or a keyword argument). That is, the following snippets would fulfill the Black formatting rules:

f(
    first_argument,
    {
        "foo": "This is a dict",
        "bar": "with values that are too long",
        "baz": "to fit on a line.",
    },
    third_argument,
)

(no change)

and

f({
    "foo": "This is a dict",
    "bar": "with values that are too long",
    "baz": "to fit on a line.",
})


f2(keyword_argument={
    "foo": "This is a dict",
    "bar": "with values that are too long",
    "baz": "to fit on a line.",
})

Furthermore, one might want to apply a similar rule to nested function calls, i.e. inline the inner function's name, when the inner function call is the sole argument of the outer function and, of course, as long as that function's name doesn't make us exceed the line length:

func3(func4(
     "very very very very long long long long argument",
))

but

this_is_a_very_very_loooooong_function_name(
    this_is_an_even_looooooooonger_function_name(
        "very very very very long long long long argument",
    )
)

As as an extreme case, consider:

func5(func6(func7(keyword_argument={
    "foo": "This is a dict dict dict dict dict dict dict dict dict dict ",
    "bar": "with values that are too long long long long long long",
    "baz": "to fit on a line line line line line line line line line.",
})))

With the above rules, this formatting would be valid. Currently, however, Black would reformat the code as

func5(
    func6(
        func7(
            keyword_argument={
                "foo": "This is a dict dict dict dict dict dict dict dict dict dict ",
                "bar": "with values that are too long long long long long long",
                "baz": "to fit on a line line line line line line line line line.",
            }
        )
    )
)

which is particularly annoying to read when the strings happen to be so long that, with the additional indentation, they exceed the maximum line length.

Note that there is really no value in so much indentation here as we're only passing a single argument to the outer functions func5 and func6 and, in the proposed formatting, the visual grouping of the closing parentheses as "one big parenthesis" makes this clear right from the start.

I admit, though, that Black's current formatting does look a bit better in my eyes when the keyword_argument above becomes a longer dictionary:

func5(func6(func7(keyword_argument={
    "foo": "This is a dict dict dict dict dict dict dict dict dict dict ",
    "bar": "with values that are too long long long long long long",
    "baz": "to fit on a line line line line line line line line line.",
    "fooo": "This is a dict dict dict dict dict dict dict dict dict dict ",
    "barr": "with values that are too long long long long long long",
    "bazz": "to fit on a line line line line line line line line line.",
    "foooo": "This is a dict dict dict dict dict dict dict dict dict dict ",
    "barrr": "with values that are too long long long long long long",
    "bazzz": "to fit on a line line line line line line line line line.",
    "fooooo": "This is a dict dict dict dict dict dict dict dict dict dict ",
    "barrrr": "with values that are too long long long long long long",
    "bazzzz": "to fit on a line line line line line line line line line.",
})))

vs.

func5(
    func6(
        func7(
            keyword_argument={
                "foo": "This is a dict dict dict dict dict dict dict dict dict dict ",
                "bar": "with values that are too long long long long long long",
                "baz": "to fit on a line line line line line line line line line.",
                "fooo": "This is a dict dict dict dict dict dict dict dict dict dict ",
                "barr": "with values that are too long long long long long long",
                "bazz": "to fit on a line line line line line line line line line.",
                "foooo": "This is a dict dict dict dict dict dict dict dict dict dict ",
                "barrr": "with values that are too long long long long long long",
                "bazzz": "to fit on a line line line line line line line line line.",
                "fooooo": "This is a dict dict dict dict dict dict dict dict dict dict ",
                "barrrr": "with values that are too long long long long long long",
                "bazzzz": "to fit on a line line line line line line line line line.",
            }
        )
    )
)

I'm not sure what to make of this, which is why I prefixed my suggestion regarding nested function calls with "one might want". The same thing goes for nested single-entry lists and dictionaries ([{ }], [[ ]] etc.).

@torotil
Copy link
Author

torotil commented Dec 19, 2020

@codethief Thanks for finding more examples of this. I’m not sure this is limited to single expressions — although optimizing them would already be a nice improvement!

We might want to consider cases like these as well:

func1("short", "args", "and", x="kwargs", long_dict={
    "foo": "This is a dict",
    "bar": "with values that are too long",
    "baz": "to fit on a line.",
})

I’m not sure yet how to generalize this. Maybe it’s about splitting the longest sub-expression first?

@klahnakoski
Copy link

@codethief

I don't think #626 is that similar. The situation presented by @torotil is different: There is precisely one argument being passed to the function.

My suggestion was meant to apply to function parameters too: inline any sole element in brackets of any kind. Maybe you are making the distinction between function parameters and list/tuple/dict elements? My guess is you are proposing inlining only sole function parameters, and leaving sole list/tuple/dict elements with existing Black formatting.

If I understand correctly, you prefer

f({
    "foo": "This is a dict",
    "bar": "with values that are too long",
    "baz": "to fit on a line.",
})

while still preferring

f = [
    {
        "foo": "This is a dict",
        "bar": "with values that are too long",
        "baz": "to fit on a line.",
    }
]

What about the following?

Full inline

f([{
    "foo": "This is a dict",
    "bar": "with values that are too long",
    "baz": "to fit on a line.",
}])

function inline

f([
    {
        "foo": "This is a dict",
        "bar": "with values that are too long",
        "baz": "to fit on a line.",
    }
])

no inline

f(
    [
        {
            "foo": "This is a dict",
            "bar": "with values that are too long",
            "baz": "to fit on a line.",
        }
    ]
)

other?

thanks

@codethief
Copy link

codethief commented Dec 21, 2020

@klahnakoski Thanks pointing this out explicitly! I deliberately excluded this from my proposal for the following reason:

For nested function calls like the ones I illustrated in my post, for the code that follows it doesn't matter how many functions are invoked in a row. What matters is the returned value (or, at the very least, the returned type). However, for nested lists and dictionaries (or combinations thereof), the amount of nesting does have an impact on the type of the overall expression. For this reason, being able to quickly determine the type of an expression is valuable and inlining list/dictionary brackets that appear in a row makes this much more difficult.

@codethief
Copy link

codethief commented Dec 21, 2020

@torotil I'm not entirely sure I would extend this to the case of multiple arguments but if it is desired, one could maybe go by the following rule: If we're looking at the very last argument of, say, N arguments and all other N-1 arguments fit on a single line, then inline those N-1 arguments. In addition, also inline any potential outer brackets (or functional call parentheses) of the N-th argument, provided we still don't exceed the maximum line length.

Again, I'm not entirely sure I would do this because it doesn't always affect readability for the better. (Basically, I would only add exceptions to the existing rules if it is clear that they always improve readability.)

@torotil
Copy link
Author

torotil commented Jan 7, 2021

I see that there is some trade-off between having exactly one “canonical black” code formatting for every particular file of code on the one side and giving developers control over the fine-tuning (eg. the trailing-comma).

In this case my problem is not so much that black’s autoformat output is not perfect in every edge case, but that it reformats hand-optimized code into something that’s less readable.

I'm not entirely sure I would do this because it doesn't always affect readability for the better.

In essence black is doing exactly that [affecting readability for the worse] by turning the OP’s desired formatting into the OP’s black formatting. Although both versions are PEP8 compliant.

@ichard26
Copy link
Collaborator

Some more context and discussion: #2071

@JelleZijlstra JelleZijlstra added the F: linebreak How should we split up lines? label May 30, 2021
@jab
Copy link

jab commented May 30, 2021

See also #1339 "Keep distinct bracket types together when sole inner expression exceeds one line"

Current format:

frozenset(
   {
       0,
       1,
       ...,   # (continues in sequence, omitted for brevity)
       23,
   }
)

Proposed format:

frozenset({                                                                              
   0,
   1,
   ...,
   23,
})

This is a very common construct. The distinct bracket types provide enough differentiation between the inner and the outer expression; the extra level of indentation is just noise.

Also seems worth noting that in practice the expressions on each line are often longer than in this example, so saving the extra level of indentation here can end up eliminating the need for the inner expressions to overflow into multiple lines, and other cascading effects that ultimately make it harder to read code that black is currently producing.

@felix-hilden
Copy link
Collaborator

I'll comment here as well, since a similar issue was closed in favor of this (#2071).

A remark on multiple arguments: removing the extra indentation becomes quite a bit messier. And I don't particularly care for it even when the arguments are earlier than the imploded parentheses.

# This I like
np.array([
    [0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9],
])

# Waste of space in my opinion
np.array(
    [
        [0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9],
    ]
)

# Fine, but a bit weird
np.array([
    [0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9],
], dtype=np.float32)

# Don't feel strongly about this, but would be more readable
np.array(
    [
        [0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9],
    ],
    dtype=np.float32
)

# Quite ugly
np.array([
    [0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9],
], dtype=np.float32, copy=False, order="F")

# Still meh even though marginally better
np.array(dtype=np.float32, copy=False, order="F", data=[
    [0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9],
])

# Better and consistent with current style
np.array(
    [
        [0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9],
    ],
    dtype=np.float32,
    copy=False,
    order="F"
)

So perhaps this rule could be applied only when the imploded item is the only one on that level. And I do think that if implemented, this should hold for all types of brackets. Thoughts?

@YodaEmbedding
Copy link

YodaEmbedding commented Jun 4, 2021

@felix-hilden The first variation is a no-brainer. However, issues arise when we want to format code which includes additional parameters/calls, which is likely why black has defaulted to the bulky-but-always-works variant.


One additional variation:

np.array([
    [0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9],
],
    dtype=np.float32,
    copy=False,
    order="F",
)

...though it does come with the disadvantage that the closing parenthesis is no longer the only non-first line with no indent. But invalidating that property isn't so bad: the ], hardly resembles a function call or statement, and it the indentation of the following arguments is a quick indicator that they must be attached to a function call. The brain should become fairly good at ignoring the ], when scanning backwards to find the associated function call.

I also don't think the "quite ugly" variation is bad -- I've seen similar styles in different languages.

np.array([
    [0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9],
], dtype=np.float32, copy=False, order="F")

It's quite compact and keeps the "less important" kwargs near the end. It also looks aesthetically balanced in comparison to the one where the positional argument is converted (!) into an unidiomatic data=[...] kwarg.

@rdaysky
Copy link

rdaysky commented Sep 14, 2021

If I may summarize the situation from a slightly different angle—

This is acceptable:

f(g(h(a, b)))

But this is somehow not acceptable:

f(g(h(
    a,
    b,
)))

It’s inconsistent that Black should accept the f(g(h( sequence in some cases but not in others. Is it concerned that it might be unclear which function a and b are the arguments of? This concern doesn’t prevent Black from producing this code:

foo(a=1, x=42).bar(
    b=2,
    c=3,
)

I believe that the root of the problem is in Black exploding expressions from the outside. If there’s an expression that Black can’t fit into a line, it switches into the mode where it puts everything on separate lines—but what if it started from the inside instead? If there’s a subexpression that can be exploded, making everything fit into the specified line length, then perhaps there’s no need to be exploding outer expressions all that zealously?

Finally, here’s another example that absolutely doesn’t benefit from an extra indentation level:

query_string = urlencode(dict(
    v=1,
    action="file_bug",
    package="black",
    key=settings.API_KEY,
))

@codethief
Copy link

@rdaysky You're right, this entire question is about whether to explode expressions from the outside or from the inside. However, judging from the discussion so far it seems that neither really hits the sweet spot. In some cases people (including myself) still prefer exploding the inner and the outer expression, even if the latter could be written more concisely.

@rdaysky
Copy link

rdaysky commented Sep 14, 2021

@codethief Unless I’m missing something, exploding more is easily forced with a trailing comma?

(Also I got here by following a chain of issues closed in favor of other issues, and I’ve realized only now that the original poster of this issue spoke explicitly of the choice between outer and inner.)

By the way, more inconsistencies of this kind. Acceptable:

x = cond and {
    1: 2,
}

Acceptable:

x = {
    1: 2,
} | more_params

Suddenly unacceptable:

x = cond and {
    1: 2,
} | more_params

Gets transformed into:

x = (
    cond
    and {
        1: 2,
    }
    | more_params
)

@felix-hilden
Copy link
Collaborator

Here are some relevant discussions: #2156 and the associated PR #2237.

@felix-hilden
Copy link
Collaborator

I'd be open to similar behavior when the collection is last in the call, as in #2724. But let's discuss this through more broadly with other maintainers first.

@francois-rozet
Copy link

francois-rozet commented Jan 30, 2022

Following @rdaysky, I think the number of indentation levels should be kept as small as possible and explosion should propagate outwards. Also I don't see why we should consider

np.array([
    [1, 2, 3],
    [4, 5, 6],
], dtype=np.float32, copy=False)

unacceptable if

np.array([
    [1, 2, 3],
    [4, 5, 6],
]).copy()

is accepted.

So maybe the rule could be "A bracketed expression is necessarily exploded if it has several exploded children." In that case, both pre and post arguments are possible and one can still enforce explosion by adding a trailing comma.

In this case, these would be acceptable

np.array([
    [1, 2, 3],
    [4, 5, 6],
], dtype=np.float32, copy=True)
map(lambda x: x**2, [
    1,
    2,
    3,
])
f(g(
    h(a=1, b=[
        2,
        3,
    ], c=4),
    d=[
        5,
        6,
    ]
), e=7)

And I do think that if implemented, this should hold for all types of brackets.

I agree, there should not be distinctions between (), [] and {}.

@YodaEmbedding
Copy link

YodaEmbedding commented Jan 31, 2022

From #2724:

Describe the style change

I'd like black use more terse formatting when it comes across dictionaries as params in function and method calls. Here's a django function based view example, function calls accepting payload dicts end up looking wrong

Example in the current Black style

def view(request):
    ...
    return render(
        request,
        'places/detail.html',
        {
            'place': place,
            'opening_times': times,
            'photos': photos,
            'staff': staff,
            'town': town,
            'segment': sector,
        },
    )

Desired style

def view(request):
    ...
    return render(request, 'places/detail.html', {
        'place': place,
        'opening_times': times,
        'photos': photos,
        'staff': staff,
        'town': town,
        'segment': sector
    })

I like terse "javascript" style for javascript. However, for black, I'm not so sure it's worth breaking the property that there are no non-primitive tokens (e.g. variables, strings) after a matching multiline ( ). However, I do think it's fine to break the property for "primitive" tokens like ()[]{}"', as was done in the topmost comment in this thread or in #1811 (comment).

@codethief
Copy link

codethief commented Feb 2, 2022

So maybe the rule could be "A bracketed expression is necessarily exploded if it has several exploded children." In that case, both pre and post arguments are possible and one can still enforce explosion by adding a trailing comma.

I really like @francois-rozet's idea! :)

@francois-rozet
Copy link

Anything new about this? This issue is the only thing that prevents me (and most Python devs I know) to use and recommend black. Nested bracketed expressions just take too much (vertical AND horizontal) space...

@yilei
Copy link
Contributor

yilei commented Oct 11, 2022

I read that there is overall agreement that for the specific single-argument case (i.e. the original example), we prefer it to not explode:

f({
   "foo": "This is a dict",
   "bar": "with values that are too long",
   "baz": "to fit on a line.",
})

There are still discussions about what to do for multiple arguments, outer function calls.

Can we first change the single-argument case? This alone affects a significant portion of our monorepo and would be a huge improvement. Happy to send a PR. (cc other maintainers for opinions from referenced issues @cooperlees @JelleZijlstra ?) Discussions about other cases can continue separately.

@Conchylicultor
Copy link

Conchylicultor commented Jan 9, 2023

+1 to @yilei

It seems there's agreement on:

f({
   "foo": "This is a dict",
   "bar": "with values that are too long",
   "baz": "to fit on a line.",
})


np.array([
    [0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9],
])

So this could be implemented today. For all other forms (kwargs, multiple arguments, nested calls...), current style is good enough:

f(
    keyword_argument={
       "foo": "This is a dict",
       "bar": "with values that are too long",
       "baz": "to fit on a line.",
    },
)

np.array(
    [
        [0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9],
    ],
    dtype=np.float32,
)

func5(
    func6(
        func7(
            "very very very very long long long long argument"
        )
    )
)

@helderco
Copy link

I very much prefer the readability of:

f({
   "foo": "This is a dict",
   "bar": "with values that are too long",
   "baz": "to fit on a line.",
})

I actually get a bit annoyed when that gets exploded, but it's also important to me to reduce noise in diffs. So if you add another parameter there then it'll get reformatted differently.

That's why, even though I prefer the readability of the single-argument here, I accept the current rule due to git. It's the same reason there's a convention to always add a comma to the last element of a multi-line list like in this dict.

@jab
Copy link

jab commented Apr 20, 2023

But with many relatively common calls like frozenset, there’s only ever one argument.

@francois-rozet
Copy link

francois-rozet commented Apr 20, 2023

That's why, even though I prefer the readability of the single-argument here, I accept the current rule due to git.

I don't think it is the job of black to care about git diffs. If you apply black to an non-formatted repo, you would expect a lot of diffs. If you want black to only reformat code that has been modified, take a look at darker or melanin.

@JelleZijlstra
Copy link
Collaborator

It's definitely Black's job to care about git diffs for code that has already been formatted with Black. That's why we add the trailing comma, for example: that way a diff that adds a new item to a list becomes a one-line diff, not a two-line diff.

@francois-rozet
Copy link

francois-rozet commented Apr 20, 2023

Oh my bad, I thought @helderco was speaking about the diffs that feature would cause in unmodified code bases. If I understand correctly, you think that because

f({
   "foo": "This is a dict",
   "bar": "with values that are too long",
   "baz": "to fit on a line.",
})

would become

f(
    {
        "foo": "This is a dict",
        "bar": "with values that are too long",
        "baz": "to fit on a line.",
    },
    ...
)

if we add more arguments, the feature is not worth it? IMO, the feature is worth the price, but it is a question of priority. Also, there would still be the option to add a trailing comma to enforce explosion.

@rdaysky
Copy link

rdaysky commented Apr 20, 2023

But the current behavior already punishes changes like

f(
    [
        g(x)
        for x in xs
    ]
)

to

f(
    ", ".join(
        [
            g(x)
            for x in xs
        ]
    )
)

or insertions of random frozenset or list or OrderedDict or whatever.

Additionally, it’s debatable whether short extra arguments should even cause the ({ in @francois-rozet’s example above to split, introducing a new indentation level and raising the entire issue of diffs being clean or not quite. For example:

return template.render({
    "salutation": "hello",
    "user_name": user.name,
}, context=context)

A strong argument in favor of this compact form is that Black is perfectly fine with

return template.render(
    salutation="hello",
    user_name=user.name,
).with_context(context)

which is pretty much the same thing.

Another perfectly readable compact format that Black doesn’t like occurs often when generating API responses:

return [{
    "x": p[0],
    "y": p[1],
} for p in ps]

@torotil
Copy link
Author

torotil commented Apr 25, 2023

I also like like @francois-rozet ’s suggestions a lot (link). I think it covers well how I would like code to look. Together with the trailing comma functionality this would give developers quite a bit more control over which part of the expression is split, which has both advantages and disadvantages.

… explosion should propagate outwards.

This doesn’t fully describe what should happen as the code first needs to decide where to start splitting before propagating outwards. This is where my original proposal to start with the longest sub-expression could be useful. So the algorithm could be somewhat like:

If an expression needs splitting because of length then:

  1. Look whether there is a trailing comma. If so split → continue with evaluating the child expressions.
  2. If no sub-expression is split yet: Try splitting the longest sub-expression (recursively) if this achieves the desired length → done.
  3. Otherwise (multiple child expressions need splitting) split the expression → continue with evaluating the child expressions.
# Sub-expressions are split first.
# input
f(["this", "is", "too", "long"])
# output
f([
  "this",
  "is",
  "too",
  "long",
])

# A split can always be forced using the trailing comma.
# input
f(["this", "is", "too", "long"],)  # note the trailing comma
# output
f(
    [
        "this", 
        "is",
        "too",
        "long",
    ],
)


# The longest sub-expression is split first
# input
f(["this", "fits"], ["this", "is", "too", "long"])
# output
f(["this", "fits"], [
     "this",
     "is",
     "too",
     "long",
])

# Split if multiple sub-expressions are split.
# input
f(["too", "long"], ["too", "long"])
# output
f(
    [
        "too",
        "long",
    ],
    [
        "too",
        "long",
    ],
)

@torotil
Copy link
Author

torotil commented Apr 25, 2023

To everyone who sees this as blocker for introducing black in their team. In my experience the current style most often already leads to better code because it’s not what developers want the code to look like: They then tend to assign the sub-expression to variables before using them which (if properly named) can increase the readability of the code.

Example:

# Instead of:
f(["too", "long"])
# People write:
words = [
    "too",
    "long",
]
f(words)

@rdaysky
Copy link

rdaysky commented Apr 25, 2023

@torotil Surely your last comment doesn’t apply to many examples shown above, like np.array, frozenset etc.

@oprypin
Copy link

oprypin commented May 29, 2023

Big +1 to this.

In unittests, I find myself making helper functions that accept (*args) for no reason other than avoiding another level of indentation with Black, even though simply accepting a list would have been much more natural.

Indeed, changing just the single-argument case would already be a huge improvement.

@henriholopainen
Copy link
Contributor

This and this (now in main --preview) should cover the single-argument case. Could we even go as far as close this issue?

@JelleZijlstra
Copy link
Collaborator

Agree. Anyone wanting a specific change that isn't covered can open a new, focused issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: linebreak How should we split up lines? T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests