Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of lambda processing #12809

Closed
dreis2211 opened this issue Mar 9, 2023 · 18 comments 路 Fixed by #12814
Closed

Improve performance of lambda processing #12809

dreis2211 opened this issue Mar 9, 2023 · 18 comments 路 Fixed by #12814
Assignees
Milestone

Comments

@dreis2211
Copy link

dreis2211 commented Mar 9, 2023

Hi 馃憢

I've been profiling checkstyle lately and noticed that the processing of lambdas is the most expensive part of the grammar in a project of us (that I unfortunately can't share). Funny enough, I noticed that our checkstyle tasks for the test code seems to take longer than the main source although there are only ~400 tests vs ~2000 main sources. (With 65.000 vs. 130.000 lines of code respectively). A common pattern in the slowest test files is this sort of pseudo-code that creates the test scenarios.

buildScenario(s -> s
	.withPrerequisite(
		prerequisite -> prerequisite.ref(prerequisiteRef)
	)
	.withUser(user -> user
		.sessionRef(userRef)
		.withUserItem(itemId, item -> item
			.withAmount(0)
		)
	)
);

The thing that stands out in the tests is the usage of (nested) functional interfaces as shown above. ~7000 in tests vs. ~2500 in the main sources.

image

I've originally opened antlr/antlr4#4164 but they sent me here to fix/optimize the grammar. I've been playing around with the grammar file and came up with this, which would essentially inline what lambdaExpression does - with the addition to use expr rather than expression:

--- a/src/main/resources/com/puppycrawl/tools/checkstyle/grammar/java/JavaLanguageParser.g4
+++ b/src/main/resources/com/puppycrawl/tools/checkstyle/grammar/java/JavaLanguageParser.g4
@@ -720,7 +720,7 @@ expr
         | BAND_ASSIGN | BOR_ASSIGN | BXOR_ASSIGN | SR_ASSIGN | BSR_ASSIGN
         | SL_ASSIGN | MOD_ASSIGN)
       expr                                                                 #binOp
-    | lambdaExpression                                                     #lambdaExp
+    | lambdaParameters LAMBDA (expr | block)                               #lambdaExp
     ;

With this I get 100% performance improvement, but obviously certain checks fail with these changes because I haven't adjusted them to the new reality locally....

Since the grammar is not really intuitive I don't know if I did something wrong. For example I don't understand why expression is used originally instead of expr in lambdaBody.

It would be great if you could take this over and check if that's a viable solution. Or if there are any other ways to optimize the lambda processing.

Cheers,
Christoph

@romani
Copy link
Member

romani commented Mar 10, 2023

@nrmancuso , looks like you need to take look.

@dreis2211, grammar unfortunately results in AST, and such tree is our API, we need to be accurate in changing it. Better to improve performance without ast changes

@nrmancuso
Copy link
Member

@dreis2211 I am willing to explore optimization in this area of the grammar, are you willing to test out a jar?

@dreis2211
Copy link
Author

dreis2211 commented Mar 10, 2023

@romani That was my thinking as well - hence the initial creation of an issue on antlr side. But as said, they sent me here.

@nrmancuso More than happy to test, if you have anything. But you can also sent me a patch file. I've checkstyle checked out locally already

@nrmancuso
Copy link
Member

Approving this issue, as fix allows us to reduce the number of method calls in both the grammar and JavaAstVisitor.

@nrmancuso
Copy link
Member

nrmancuso commented Mar 10, 2023

More than happy to test, if you have anything. But you can also sent me a patch file. I've checkstyle checked out locally already

PR is at #12814, @dreis2211 please build a jar from this branch and try it out by mvn clean package -Passembly

@dreis2211
Copy link
Author

@nrmancuso No improvement noticeable unless I change expression to expr...But that also breaks a couple of tests...

@romani
Copy link
Member

romani commented Mar 10, 2023

are you willing to test out a jar?

No :), you change code, you do full testing.

@nrmancuso
Copy link
Member

nrmancuso commented Mar 10, 2023

are you willing to test out a jar?

No :), you change code, you do full testing.

I do not have access to @dreis2211 鈥榮 codebase, so I cannot test the performance as it relates to his code. The changes I have proposed in my PR are enough to save us some method calls on their own and stand alone as an improvement on our side. As always, I would provide full check regression testing and ANTLR regression testing for any grammar changes.

@romani
Copy link
Member

romani commented Mar 10, 2023

Ups, sorry @nrmancuso, I thought it is comment of issue author.

@dreis2211
Copy link
Author

dreis2211 commented Mar 10, 2023

@romani If you don't mind me saying, I don't find your comment helpful when I already agreed to help with testing. It's fine for me. I want performance of my project improved and I can't share it, so I find it reasonable to ask for testing help and to see if things are improved on my side that the particular maintainer can't do.

EDIT: You just wrote in the same second that you misunderstood things, so all good.

nrmancuso added a commit to nrmancuso/checkstyle that referenced this issue Mar 12, 2023
@nrmancuso
Copy link
Member

@dreis2211 just pushed changes to #12814, please try again.

@dreis2211
Copy link
Author

dreis2211 commented Mar 12, 2023

@nrmancuso I would call this a success! Tests pass as well ;-)

Old

$ /usr/bin/time -l java -jar checkstyle-10.8.0-all.jar -c config/checkstyle/checkstyle.xml path/to/tests
Starting audit...
Audit done.
       16,30 real        35,77 user         1,51 sys

New

$ /usr/bin/time -l java -jar checkstyle-10.8.1-SNAPSHOT-all.jar -c config/checkstyle/checkstyle.xml path/to/tests
Starting audit...
Audit done.
       9,47 real        28,40 user         1,41 sys

Could you elaborate why the suggested changes make a difference? I haven't really understood why - given that we only removed a level of abstraction in the grammar and calling some code manually. E.g. the changes still make this an expression rather than an expr. But the explanation would be just a bonus.

Thanks a lot :)

nrmancuso added a commit to nrmancuso/checkstyle that referenced this issue Mar 12, 2023
@nrmancuso
Copy link
Member

nrmancuso commented Mar 12, 2023

@dreis2211 here is why:

When we visit the parse tree node that corresponds to the expression production rule, we create an imaginary EXPR node. This imaginary node make it easier to find expressions for analysis in checks. When we visit the parse tree node that corresponds to any of the expr alternatives, we do not create another imaginary EXPR node.

So, you can think about the difference between expression and expr like this; expression is just an entryway to all of the expression rules to eliminate a bunch of nested EXPR nodes. However, for historical reasons, expressions nested in lambda expressions DO have a bunch of nested EXPR nodes. This is why I originally wrote the grammar to just reuse the expression rule, which avoided some extra code in the visitor (as you can see now exists in the PR).

Now, it is important to understand that the expr rule is the most performance sensitive part of our grammar; this is for a few reasons:

  1. Expressions can be very deeply nested
  2. Long lookahead can be required to determine which rule to enter
  3. Operator precedence
  4. right associativity in some rules (ANTLR is normally left associative)

For this reason, I used ANTLR's left recursion to write the expr rule, since it helps to condense all of the expression rules, reduce the number of method calls, and helps with overhead in general. This rule can be a bit less intuitive than the rest of the grammar if you aren't familiar with this concept.

Anyway, I suspect that leaving and re-entering the expr rule was what caused your performance issue, since it basically avoided the left recursion that helps the performance in the expr rule in the first place.

@dreis2211
Copy link
Author

@nrmancuso Thanks for the explanation.

And more importantly. Thanks for the improvement in general and the swift reaction. I've already tested your newest update without Optional. Looks good performance wise!

nrmancuso added a commit to nrmancuso/checkstyle that referenced this issue Mar 12, 2023
nrmancuso added a commit to nrmancuso/checkstyle that referenced this issue Mar 16, 2023
nrmancuso added a commit to nrmancuso/checkstyle that referenced this issue Mar 17, 2023
nrmancuso added a commit to nrmancuso/checkstyle that referenced this issue Mar 17, 2023
nrmancuso added a commit to nrmancuso/checkstyle that referenced this issue Mar 17, 2023
@romani romani added the bug label Mar 18, 2023
@github-actions github-actions bot added this to the 10.9.2 milestone Mar 18, 2023
@romani
Copy link
Member

romani commented Mar 18, 2023

@dreis2211, fix is released please test on your source code

@dreis2211
Copy link
Author

Already did - thanks 馃槉

@romani
Copy link
Member

romani commented Mar 18, 2023

Please share timing how it was and how it becomes

@dreis2211
Copy link
Author

dreis2211 commented Mar 18, 2023

Already did in this issue. Hasn't drastically changed (plus/minus a second on either side). See #12809 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants