New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Vector64.Create for constants #101662
Conversation
PTAL @kunalspathak @tannergooding cc @dotnet/jit-contrib Empy size diffs, 283 contexts with text diffs (becuase size is the same) |
src/coreclr/jit/codegenarm64.cpp
Outdated
@@ -2406,14 +2406,36 @@ void CodeGen::genSetRegToConst(regNumber targetReg, var_types targetType, GenTre | |||
} | |||
else | |||
{ | |||
// Get a temp integer register to compute long address. | |||
regNumber addrReg = tree->GetSingleTempReg(); | |||
simd8_t val = vecCon->gtSimd8Val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same handling should presumably be added for TYP_SIMD16
as well, yes?
It should also be fine to expose for TYP_SIMD12
, since any operations are already required to mask out the unused bit when its impactful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this is missing some cases or potentially conflicting with the IsValidConstForMovImm
in LowerHWIntrinsicCreate
handling
That check is supposed to recognize and catch this case, lowering it instead to Arm64.DuplicateToVectorXXX
, which is itself supposed to emit the more optimized mov
instruction
It might be a case where we need both since sometimes we'll have Create(...)
and sometimes we'll have CNS_VEC isntead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point, I'll check what we generate for other VectorX.Create
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Vector128 as well. I think I'll leave SIMD12
unchanged since I think we probably don't want to have a garbage value in the last 4 bytes of it and prefer it being zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with not doing TYP_SIMD12 in this PR, but I think we probably want to handle it as well.
The general logic for TYP_SIMD12 is explicitly setup to allow for the last element to be undefined for perf reasons. Most places it doesn't matter and the places where it does, values like 0
are often just as bad as denormals or other random values (as it frequently causes NaN to be produced).
x64, for example, will optimize it to a broadcast in many cases so that the denser codegen can be emitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok if you think it's fine I pushed the SIMD12 as well - it allowed to unify them all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes as is look correct, there's just likely more we could do here to ensure other cases are handled
Closes #101661