Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Token Type Utility Analyzer: Avoid allocations #6785

Merged

Conversation

martin-strecker-sonarsource
Copy link
Contributor

The current implementation of the TokenTypeAnalyzerBase has the following problems:

  • It allocates a new TokenClassifierBase per Token
  • It does an "OrderBy" after all classification is collected
  • It does a span-based text lookup in the tree for each classification

This PR changes how the TokenClassifierBase operates.

Instead of one new TokenClassifierBase per Token, the TokenClassifierBase is now created once per document. The token is passed to ClassifyToken. ClassifyToken was changed

  • from adding TokenInfo into the collecting list
  • to returning the TokenInfo to the caller

The trivia handling was extracted into TriviaClassifierBase to allow TokenClassifierBase.ClassifyToken() to return a single TokenInfo instead of a list. The trivia is now processed before (LeadingTrivia) and after (TrailingTrivia) the token and so the ordering is guaranteed.

The calculation of the TokenInfo.TextRange was simplified, and the GetSubText call in string.IsNullOrWhiteSpace(token.SyntaxTree.GetText().GetSubText(span).ToString()) check could be removed.

This PR is the groundwork needed to tackle ClassifyIdentifier next, which can be improved by avoiding querying the semantic model in many cases (e.g., an identifier in a parameter syntax is known to be an IParameterSymbol and there is no need to look up the symbol).

Copy link
Contributor

@Tim-Pohlmann Tim-Pohlmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need the inner class here at all? I think getting rid of it, makes the whole thing much easier to understand and extend.

@github-actions github-actions bot moved this from Review in progress to In progress in Best Kanban Feb 27, 2023
@github-actions github-actions bot moved this from In progress to Review in progress in Best Kanban Feb 27, 2023
@github-actions github-actions bot moved this from Review in progress to Review approved in Best Kanban Feb 28, 2023
@martin-strecker-sonarsource
Copy link
Contributor Author

@pavel-mikula-sonarsource can you please have a second look and merge if appropriate? The reasoning for the change is given in the PR description.

@github-actions github-actions bot moved this from Review approved to Review in progress in Best Kanban Feb 28, 2023
@pavel-mikula-sonarsource
Copy link
Contributor

@martin-strecker-sonarsource can you run the full analysis for Akka.Net and Lucene.Net projects to measure the impact?

@martin-strecker-sonarsource
Copy link
Contributor Author

This PR is more of a refactoring and a precondition to improving ClassifyIdentifier which queries the semantic model twice for each identifier token (We can easily avoid this per syntactic check in a lot of cases). I only expect this PR to save some allocations because we created a classifier class per token before. The results are within the margin of error.

The results are:
Allocations

Master PR Diff
Akka 8,191.27818 MB 7,672.96046 MB -6.33%
Lucene 11,316.58781 MB 9,753.20441 MB -13.81%

Runtime

Master PR Diff
Akka 49.745s 53.301s +7.12%
Lucene 57.709s 54.049s -6.5%

A second run of Akka for the PR gave a runtime of 47s (slightly better than the master) with 7,688.16615 MB allocations.

dotnet AnalyzerRunner.dll C:\Projects\sonar-dotnet\analyzers\packaging\binaries C:\Projects\sonar-dotnet\analyzers\its\sources\akka.net\src\Akka.sln /a TokenTypeAnalyzer

dotnet AnalyzerRunner.dll C:\Projects\sonar-dotnet\analyzers\packaging\binaries C:\Projects\DeepSast\lucenenet\Lucene.Net.sln /a TokenTypeAnalyzer

Master
Akka.Net
Found 0 diagnostics in 82145ms (8589177704 bytes allocated)
Statistics for TokenTypeAnalyzer:
Concurrent:                     False
Execution time (ms):            49745,2454

Lucene.Net
Found 0 diagnostics in 97372ms (11866302384 bytes allocated)
Statistics for TokenTypeAnalyzer:
Concurrent:                     False
Execution time (ms):            57709,4617

https://github.com/SonarSource/sonar-dotnet/pull/6785

Akka.Net
Found 0 diagnostics in 91173ms (8045682184 bytes allocated)
Statistics for TokenTypeAnalyzer:
Concurrent:                     False
Execution time (ms):            53301,8113

Lucene.Net
Found 0 diagnostics in 95608ms (10226976072 bytes allocated)
Statistics for TokenTypeAnalyzer:
Concurrent:                     False
Execution time (ms):            54049,4831

Copy link
Contributor

@pavel-mikula-sonarsource pavel-mikula-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, that's a very nice change!

@github-actions github-actions bot moved this from Review in progress to Review approved in Best Kanban Feb 28, 2023
@pavel-mikula-sonarsource
Copy link
Contributor

Please run it without AnalyzerRunner to have more realistic view?
What I usually do:
run S4NET begin step
change SonarAnalysisConfig.xml to point to local "release" DLLs.
Run the build 3x to gather the times

@martin-strecker-sonarsource
Copy link
Contributor Author

Please run it without AnalyzerRunner to have a more realistic view?

There shouldn't be any difference in runtime as the work done is the same as before. We avoid allocations in the hot path, which just reduces the pressure on the garbage collector. The analyzer runner also measures allocations, and we see some reduction there (the allocation measure doesn't fluctuate as much as the runtime does, so the ~10% reduction there is real).
I can still do the S4NET approach, but I don't think we will see significant changes until we get rid of the GetDeclaredSymbol and GetSymbolInfo calls in ClassifyIdentifier. For instance:

MyClass(List<String> list) // constructor

Causes two semantic model calls for each identifier token:

  • MyClass
  • List
  • String
  • list

We can easily avoid these in this case by checking the parent:

  • MyClass -> token == (token.Parent as ConstructorDeclarationSyntax).Identifier -> Classify as type
  • List -> token == (token.Parent as GenericNameSyntax).Identifier -> Must be a type
  • String -> token.Parent is IdentifierName { Parent: TypeArgumentListSyntax } -> Must be a type
  • list -> token == (token.Parent as ParameterSyntax).Identifier -> Its a parameter name. No classification

That alone saves 2x8 semantic model calls.

@pavel-mikula-sonarsource
Copy link
Contributor

While the memory footprint is good to know too, the purpose is to have realistic data in terms of user experience.

@sonarcloud
Copy link

sonarcloud bot commented Mar 1, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

96.4% 96.4% Coverage
11.0% 11.0% Duplication

@sonarcloud
Copy link

sonarcloud bot commented Mar 1, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@martin-strecker-sonarsource
Copy link
Contributor Author

This is just a single run for Lucene only, but it shows no difference in performance for the TokenAnalyzer. As long as the old SE engine is active, we will not see any impact here at all as it is responsible for 75% of the overall performance.
image

image

@martin-strecker-sonarsource martin-strecker-sonarsource merged commit 2eb0a0c into master Mar 1, 2023
22 checks passed
Best Kanban automation moved this from Review approved to Validate Peach Mar 1, 2023
@martin-strecker-sonarsource martin-strecker-sonarsource deleted the Martin/TokenTypeAnalyzer_AvoidAllocations branch March 1, 2023 08:39
@martin-strecker-sonarsource
Copy link
Contributor Author

I looked into the code colorization on peach for some projects. I could not find anything unusual.

@martin-strecker-sonarsource martin-strecker-sonarsource moved this from Validate Peach to Done in Best Kanban Mar 2, 2023
@pavel-mikula-sonarsource
Copy link
Contributor

Peach deployment failed and this was not deployed on Peach yet

@pavel-mikula-sonarsource pavel-mikula-sonarsource moved this from Done to Validate Peach in Best Kanban Mar 2, 2023
@martin-strecker-sonarsource martin-strecker-sonarsource moved this from Validate Peach to Done in Best Kanban Mar 3, 2023
@andrei-epure-sonarsource andrei-epure-sonarsource added this to the 8.54 milestone Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Best Kanban
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants