Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Vector64.Create for constants #101662

Merged
merged 3 commits into from May 17, 2024
Merged

Optimize Vector64.Create for constants #101662

merged 3 commits into from May 17, 2024

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 28, 2024

Closes #101661

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 28, 2024
@EgorBo EgorBo marked this pull request as ready for review April 28, 2024 19:25
@EgorBo
Copy link
Member Author

EgorBo commented Apr 29, 2024

PTAL @kunalspathak @tannergooding cc @dotnet/jit-contrib

Empy size diffs, 283 contexts with text diffs (becuase size is the same)

@@ -2406,14 +2406,36 @@ void CodeGen::genSetRegToConst(regNumber targetReg, var_types targetType, GenTre
}
else
{
// Get a temp integer register to compute long address.
regNumber addrReg = tree->GetSingleTempReg();
simd8_t val = vecCon->gtSimd8Val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same handling should presumably be added for TYP_SIMD16 as well, yes?

It should also be fine to expose for TYP_SIMD12, since any operations are already required to mask out the unused bit when its impactful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if this is missing some cases or potentially conflicting with the IsValidConstForMovImm in LowerHWIntrinsicCreate handling

That check is supposed to recognize and catch this case, lowering it instead to Arm64.DuplicateToVectorXXX, which is itself supposed to emit the more optimized mov instruction

It might be a case where we need both since sometimes we'll have Create(...) and sometimes we'll have CNS_VEC isntead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point, I'll check what we generate for other VectorX.Create

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Vector128 as well. I think I'll leave SIMD12 unchanged since I think we probably don't want to have a garbage value in the last 4 bytes of it and prefer it being zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with not doing TYP_SIMD12 in this PR, but I think we probably want to handle it as well.

The general logic for TYP_SIMD12 is explicitly setup to allow for the last element to be undefined for perf reasons. Most places it doesn't matter and the places where it does, values like 0 are often just as bad as denormals or other random values (as it frequently causes NaN to be produced).

x64, for example, will optimize it to a broadcast in many cases so that the denser codegen can be emitted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok if you think it's fine I pushed the SIMD12 as well - it allowed to unify them all

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes as is look correct, there's just likely more we could do here to ensure other cases are handled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARM64: suboptimal codegen for Vector64.Create(1)
2 participants