Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving from SHA1 to SHA256 for Hash task #8812

Merged
merged 5 commits into from
Jul 11, 2023

Conversation

JanKrivanek
Copy link
Member

Fixes (internal) https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1645155/

Context

Moving away from crypthographicaly weak SHA1 to SHA256 for Hash task

Changes Made

SHA1 - SHA256

Testing

Existing tests

Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted with the test changes to expectedHash, this is a breaking change if someone uses the Hash task and verifies that the output matches some fixed value, but I'm still in favor of moving to the more secure SHA256 implementation. Also, I noticed we have a GetFileHash task competing with this one with fairly small differences, and that one seems to already use SHA256, so it feels weird that this one was overlooked...

@baronfel
Copy link
Member

We probably need a breaking change notice for this one, yeah?

@JanKrivanek
Copy link
Member Author

We probably need a breaking change notice for this one, yeah?

Thanks for bringing this up @baronfel

I haven't find any learn.microsoft.com documentation for this task, so probably only source serves as a documentation and it makes it clear this is subject to change: https://github.com/dotnet/msbuild/blob/main/src/Tasks/Hash.cs#L17

So it might not be necessary - though it certainly would not hurt

@JanKrivanek JanKrivanek added this to the VS 17.8 milestone May 31, 2023
@rainersigwald
Copy link
Member

We should document the task with the explicit "the hash function is an implementation detail and may vary between MSBuild engine versions" caveat. But I'd say we should also call it out as an info-level "breaking" change.

@JanKrivanek
Copy link
Member Author

@rainersigwald - I wasn't able to find documentation for the HashTask - only the GH code comments - those explicitly mention that hash algorithm is subject to change between releases. I agree with the breaking change - @baronfel I believe you planned to mention it, correct?
Chet as well adviced to merge it only into 17.8 (and behind changewave), to prevent mid-cycle (sdk) breaking change - so I'm keeping this open now

@rokonec
Copy link
Member

rokonec commented Jun 12, 2023

SHA256 is 15% - 25% slower than SHA1. Given the intended use case of HashTask which is "to measure only uniqueness between build executions" this task does not required to use "Strong cryptography". I am not sure if this pose real possibility of perf regression as it hashes only itemspec, but even then, I am not convinced it is actually needed.

@rainersigwald, @JanKrivanek Can we please reconsider it?

@JanKrivanek
Copy link
Member Author

SHA256 is 15% - 25% slower than SHA1. Given the intended use case of HashTask which is "to measure only uniqueness between build executions" this task does not required to use "Strong cryptography". I am not sure if this pose real possibility of perf regression as it hashes only itemspec, but even then, I am not convinced it is actually needed.

@rainersigwald, @JanKrivanek Can we please reconsider it?

Staying with SHA1 is not an option

We can pull in System.IO.Hashing - then we'd be able to use some non-cryptographic hash algo for an improved perf. For that I however don't have full context of what's needed to insert new dep(s) into VS (I believe I can get some inspiration in PR like this [internal], but still I would like to fully understand the requirements and consequences).

A dummy search seems to yield some usages within our repos and some in the wild, not much though - so priority might be bit questionable.

That being said - if anybody is willing to give me quick crash course (or point me to some PR (set of PRs) containing all required changes) I'm up for attempting to rewrite this with System.IO.Hashing

@JanKrivanek
Copy link
Member Author

In case we'll choose to leverage System.IO.Hashing - than this is related: dotnet/roslyn#68700

@CyrusNajmabadi
Copy link
Member

In case we'll choose to leverage System.IO.Hashing - than this is related: dotnet/roslyn#68700

I'm a little confused here about the roslyn side. The hash algorithm we use it not an implementation detail. It's a contract we have (including on the PDB side, where we use these hashes to match things up). I'm not understanding the idea here in terms of changing things out.

Note: i would have been happy to stay with sha1 (since we don't use this for cryptographic purposes). But my recollectino was that it was a mandatory MS thing (with no exceptions allowed whatsoever) to maintain support for sha1 without sha256 support.

@JanKrivanek
Copy link
Member Author

The Prism measurements of current impact: [internal link] https://prism.vsdata.io/?query=bf%3DMicrosoft.Build.Tasks.Core.dll!Microsoft.Build.Tasks.Hash.Execute&eventType=cpu

0.12% of CPU used by the Hash task in measured case. It sounds like the ~20% degradation would be negligible

@JanKrivanek JanKrivanek force-pushed the proto/hash-task-sha256 branch from d4b6f38 to e9cefe2 Compare July 10, 2023 07:52
@JanKrivanek
Copy link
Member Author

Perf impact measurements

Conclusion - for a OrchardCore build from command line and from VS the SHA256 backed Hash tasks added ~80ms to build time (both full rebuild and incremental rebuild) - this has more weight on no-change-incremental build (which is faster), but still the slowdown is on the edge 0.2% of the entire build time.

Below are samples of measurement for command line build

Full rebuild

Comparison of Orchard Core full rebuild via msbuild.exe orchardcore.sln (the entire build took ~90 seconds for both scenarios):

Hash task backed by SHA1

CPU sampling:
image
MSBuild Task events:
image

Hash task backed by SHA256

CPU sampling:
image
MSBuild Task events:
image

Incremental build

Comparison of Orchard Core incremental rebuild via msbuild.exe orchardcore.sln with no changes (the entire build took ~30 seconds for both scenarios):

Hash task backed by SHA1

CPU sampling:
image
MSBuild Task events:
image

Hash task backed by SHA256

CPU sampling:
image
MSBuild Task events:
image

Copy link
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go for it - If performance measurements are acceptable .

Minor notes:

  1. We are using crypto-strong hashing algos when three is no need for it. It could be that thanks for HW optimization and stuff it is faster than crypto-weak hashing, or maybe loading another dependency on system.hashing can cancel perf gains, or we strategically do not want to introduce another dependency ... whatever is the reason we shall document it in code for future us.

  2. SUPER NIT: UnitTest HashTaskDifferentInputSizesTest is testing implementation, basically duplicates hashing code which makes it hard to maintain. I would probably just verify that it produces no conflicts by hashset or something.

@JanKrivanek JanKrivanek enabled auto-merge July 11, 2023 11:33
@JanKrivanek JanKrivanek merged commit 5ab64ed into dotnet:main Jul 11, 2023
@JanKrivanek JanKrivanek deleted the proto/hash-task-sha256 branch July 11, 2023 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants