Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimplifyLibCalls: Don't require ldexp to emit intrinsic in exp2 combine #92707

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented May 19, 2024

When folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp libcall to emit the intrinsic.

The intrinsic needs to be handled regardless of whether the system has a libcall, and we have an inline implementation of ldexp already. This fixes the instance in the exp2->ldexp fold. Another instance exists for the pow(2) -> ldexp case

The LTO test change isn't ideal, since it's just moving the problem to another instance where we're relying on implied libm behavior for an intrinsic transform. Use exp10 since that's a harder case to solve in the libcall house of cards we have.

@arsenm arsenm added the floating-point Floating-point math label May 19, 2024
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels May 19, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 19, 2024

@llvm/pr-subscribers-lto

@llvm/pr-subscribers-llvm-transforms

Author: Matt Arsenault (arsenm)

Changes

When folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp libcall to emit the intrinsic.

The intrinsic needs to be handled regardless of whether the system has a libcall, and we have an inline implementation of ldexp already. This fixes the instance in the exp2->ldexp fold. Another instance exists for the pow(2) -> ldexp case

The LTO test change isn't ideal, since it's just moving the problem to another instance where we're relying on implied libm behavior for an intrinsic transform. Use exp10 since that's a harder case to solve in the libcall house of cards we have.


Full diff: https://github.com/llvm/llvm-project/pull/92707.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp (+2-3)
  • (modified) llvm/test/LTO/X86/triple-init2.ll (+5-6)
  • (modified) llvm/test/Transforms/InstCombine/exp2-1.ll (+17-12)
  • (modified) llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll (+39-81)
diff --git a/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp b/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
index eb1224abf00e2..df89e1775a319 100644
--- a/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
@@ -2386,12 +2386,11 @@ Value *LibCallSimplifier::optimizeExp2(CallInst *CI, IRBuilderBase &B) {
   // exp2(uitofp(x)) -> ldexp(1.0, zext(x))  if sizeof(x) < IntSize
   Value *Op = CI->getArgOperand(0);
   if ((isa<SIToFPInst>(Op) || isa<UIToFPInst>(Op)) &&
-      hasFloatFn(M, TLI, Ty, LibFunc_ldexp, LibFunc_ldexpf, LibFunc_ldexpl)) {
+      (UseIntrinsic ||
+       hasFloatFn(M, TLI, Ty, LibFunc_ldexp, LibFunc_ldexpf, LibFunc_ldexpl))) {
     if (Value *Exp = getIntToFPVal(Op, B, TLI->getIntSize())) {
       Constant *One = ConstantFP::get(Ty, 1.0);
 
-      // TODO: Emitting the intrinsic should not depend on whether the libcall
-      // is available.
       if (UseIntrinsic) {
         return copyFlags(*CI, B.CreateIntrinsic(Intrinsic::ldexp,
                                                 {Ty, Exp->getType()},
diff --git a/llvm/test/LTO/X86/triple-init2.ll b/llvm/test/LTO/X86/triple-init2.ll
index 2638180ef33c6..bc5ecf9785a28 100644
--- a/llvm/test/LTO/X86/triple-init2.ll
+++ b/llvm/test/LTO/X86/triple-init2.ll
@@ -11,21 +11,20 @@
 ; RUN: llvm-lto2 run -r %t1,main,plx -o %t2 %t1
 ; RUN: llvm-nm %t2.1 | FileCheck %s
 
-; We check that LTO will be aware of target triple and prevent exp2 to ldexpf
+; We check that LTO will be aware of target triple and prevent pow to exp10
 ; transformation on Windows.
-; CHECK: U exp2f
+; CHECK: U powf
 
 target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-pc-windows-msvc19.11.0"
 
+declare float @llvm.pow.f32(float, float)
+
 define dso_local i32 @main(i32 %argc, ptr nocapture readnone %argv) local_unnamed_addr {
 entry:
   %conv = sitofp i32 %argc to float
-  %exp2 = tail call float @llvm.exp2.f32(float %conv)
+  %exp2 = tail call float @llvm.pow.f32(float 10.0, float %conv)
   %conv1 = fptosi float %exp2 to i32
   ret i32 %conv1
 }
 
-; Function Attrs: nounwind readnone speculatable
-declare float @llvm.exp2.f32(float)
-
diff --git a/llvm/test/Transforms/InstCombine/exp2-1.ll b/llvm/test/Transforms/InstCombine/exp2-1.ll
index 2dff0b08ecf97..d8bd0a4d8159d 100644
--- a/llvm/test/Transforms/InstCombine/exp2-1.ll
+++ b/llvm/test/Transforms/InstCombine/exp2-1.ll
@@ -242,8 +242,8 @@ define double @test_simplify9(i8 zeroext %x) {
 ; NOLDEXPF-NEXT:    ret double [[RET]]
 ;
 ; NOLDEXP-LABEL: @test_simplify9(
-; NOLDEXP-NEXT:    [[CONV:%.*]] = uitofp i8 [[X:%.*]] to double
-; NOLDEXP-NEXT:    [[RET:%.*]] = call double @llvm.exp2.f64(double [[CONV]])
+; NOLDEXP-NEXT:    [[TMP1:%.*]] = zext i8 [[X:%.*]] to i32
+; NOLDEXP-NEXT:    [[RET:%.*]] = call double @llvm.ldexp.f64.i32(double 1.000000e+00, i32 [[TMP1]])
 ; NOLDEXP-NEXT:    ret double [[RET]]
 ;
   %conv = uitofp i8 %x to double
@@ -263,13 +263,13 @@ define float @test_simplify10(i8 zeroext %x) {
 ; LDEXP16-NEXT:    ret float [[RET]]
 ;
 ; NOLDEXPF-LABEL: @test_simplify10(
-; NOLDEXPF-NEXT:    [[CONV:%.*]] = uitofp i8 [[X:%.*]] to float
-; NOLDEXPF-NEXT:    [[RET:%.*]] = call float @llvm.exp2.f32(float [[CONV]])
+; NOLDEXPF-NEXT:    [[TMP1:%.*]] = zext i8 [[X:%.*]] to i32
+; NOLDEXPF-NEXT:    [[RET:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
 ; NOLDEXPF-NEXT:    ret float [[RET]]
 ;
 ; NOLDEXP-LABEL: @test_simplify10(
-; NOLDEXP-NEXT:    [[CONV:%.*]] = uitofp i8 [[X:%.*]] to float
-; NOLDEXP-NEXT:    [[RET:%.*]] = call float @llvm.exp2.f32(float [[CONV]])
+; NOLDEXP-NEXT:    [[TMP1:%.*]] = zext i8 [[X:%.*]] to i32
+; NOLDEXP-NEXT:    [[RET:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
 ; NOLDEXP-NEXT:    ret float [[RET]]
 ;
   %conv = uitofp i8 %x to float
@@ -289,13 +289,13 @@ define float @sitofp_scalar_intrinsic_with_FMF(i8 %x) {
 ; LDEXP16-NEXT:    ret float [[R]]
 ;
 ; NOLDEXPF-LABEL: @sitofp_scalar_intrinsic_with_FMF(
-; NOLDEXPF-NEXT:    [[S:%.*]] = sitofp i8 [[X:%.*]] to float
-; NOLDEXPF-NEXT:    [[R:%.*]] = tail call nnan float @llvm.exp2.f32(float [[S]])
+; NOLDEXPF-NEXT:    [[TMP1:%.*]] = sext i8 [[X:%.*]] to i32
+; NOLDEXPF-NEXT:    [[R:%.*]] = tail call nnan float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
 ; NOLDEXPF-NEXT:    ret float [[R]]
 ;
 ; NOLDEXP-LABEL: @sitofp_scalar_intrinsic_with_FMF(
-; NOLDEXP-NEXT:    [[S:%.*]] = sitofp i8 [[X:%.*]] to float
-; NOLDEXP-NEXT:    [[R:%.*]] = tail call nnan float @llvm.exp2.f32(float [[S]])
+; NOLDEXP-NEXT:    [[TMP1:%.*]] = sext i8 [[X:%.*]] to i32
+; NOLDEXP-NEXT:    [[R:%.*]] = tail call nnan float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
 ; NOLDEXP-NEXT:    ret float [[R]]
 ;
   %s = sitofp i8 %x to float
@@ -317,9 +317,14 @@ define <2 x float> @sitofp_vector_intrinsic_with_FMF(<2 x i8> %x) {
 ; LDEXP16-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.ldexp.v2f32.v2i16(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i16> [[TMP1]])
 ; LDEXP16-NEXT:    ret <2 x float> [[R]]
 ;
+; NOLDEXPF-LABEL: @sitofp_vector_intrinsic_with_FMF(
+; NOLDEXPF-NEXT:    [[TMP1:%.*]] = sext <2 x i8> [[X:%.*]] to <2 x i32>
+; NOLDEXPF-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
+; NOLDEXPF-NEXT:    ret <2 x float> [[R]]
+;
 ; NOLDEXP-LABEL: @sitofp_vector_intrinsic_with_FMF(
-; NOLDEXP-NEXT:    [[S:%.*]] = sitofp <2 x i8> [[X:%.*]] to <2 x float>
-; NOLDEXP-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.exp2.v2f32(<2 x float> [[S]])
+; NOLDEXP-NEXT:    [[TMP1:%.*]] = sext <2 x i8> [[X:%.*]] to <2 x i32>
+; NOLDEXP-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
 ; NOLDEXP-NEXT:    ret <2 x float> [[R]]
 ;
   %s = sitofp <2 x i8> %x to <2 x float>
diff --git a/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll b/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll
index 6e5be5a19d6da..969020140cb33 100644
--- a/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll
+++ b/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll
@@ -1,19 +1,13 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S -passes=instcombine %s | FileCheck -check-prefixes=CHECK,LDEXP %s
-; RUN: opt -S -passes=instcombine -disable-builtin=ldexpf -disable-builtin=ldexp -disable-builtin=ldexpl %s | FileCheck -check-prefixes=CHECK,NOLDEXP %s
+; RUN: opt -S -passes=instcombine %s | FileCheck %s
+; RUN: opt -S -passes=instcombine -disable-builtin=ldexpf -disable-builtin=ldexp -disable-builtin=ldexpl %s | FileCheck %s
 
 define float @exp2_f32_sitofp_i8(i8 %x) {
-; LDEXP-LABEL: define float @exp2_f32_sitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT:    [[LDEXPF:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT:    ret float [[LDEXPF]]
-;
-; NOLDEXP-LABEL: define float @exp2_f32_sitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT:    [[ITOFP:%.*]] = sitofp i8 [[X]] to float
-; NOLDEXP-NEXT:    [[EXP2:%.*]] = call float @llvm.exp2.f32(float [[ITOFP]])
-; NOLDEXP-NEXT:    ret float [[EXP2]]
+; CHECK-LABEL: define float @exp2_f32_sitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT:    [[EXP2:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT:    ret float [[EXP2]]
 ;
   %itofp = sitofp i8 %x to float
   %exp2 = call float @llvm.exp2.f32(float %itofp)
@@ -21,17 +15,11 @@ define float @exp2_f32_sitofp_i8(i8 %x) {
 }
 
 define float @exp2_f32_sitofp_i8_flags(i8 %x) {
-; LDEXP-LABEL: define float @exp2_f32_sitofp_i8_flags(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT:    [[LDEXPF:%.*]] = call nnan ninf float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT:    ret float [[LDEXPF]]
-;
-; NOLDEXP-LABEL: define float @exp2_f32_sitofp_i8_flags(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT:    [[ITOFP:%.*]] = sitofp i8 [[X]] to float
-; NOLDEXP-NEXT:    [[EXP2:%.*]] = call nnan ninf float @llvm.exp2.f32(float [[ITOFP]])
-; NOLDEXP-NEXT:    ret float [[EXP2]]
+; CHECK-LABEL: define float @exp2_f32_sitofp_i8_flags(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT:    [[EXP2:%.*]] = call nnan ninf float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT:    ret float [[EXP2]]
 ;
   %itofp = sitofp i8 %x to float
   %exp2 = call nnan ninf float @llvm.exp2.f32(float %itofp)
@@ -39,17 +27,11 @@ define float @exp2_f32_sitofp_i8_flags(i8 %x) {
 }
 
 define <2 x float> @exp2_v2f32_sitofp_v2i8(<2 x i8> %x) {
-; LDEXP-LABEL: define <2 x float> @exp2_v2f32_sitofp_v2i8(
-; LDEXP-SAME: <2 x i8> [[X:%.*]]) {
-; LDEXP-NEXT:    [[TMP1:%.*]] = sext <2 x i8> [[X]] to <2 x i32>
-; LDEXP-NEXT:    [[EXP2:%.*]] = call <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
-; LDEXP-NEXT:    ret <2 x float> [[EXP2]]
-;
-; NOLDEXP-LABEL: define <2 x float> @exp2_v2f32_sitofp_v2i8(
-; NOLDEXP-SAME: <2 x i8> [[X:%.*]]) {
-; NOLDEXP-NEXT:    [[ITOFP:%.*]] = sitofp <2 x i8> [[X]] to <2 x float>
-; NOLDEXP-NEXT:    [[EXP2:%.*]] = call <2 x float> @llvm.exp2.v2f32(<2 x float> [[ITOFP]])
-; NOLDEXP-NEXT:    ret <2 x float> [[EXP2]]
+; CHECK-LABEL: define <2 x float> @exp2_v2f32_sitofp_v2i8(
+; CHECK-SAME: <2 x i8> [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sext <2 x i8> [[X]] to <2 x i32>
+; CHECK-NEXT:    [[EXP2:%.*]] = call <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <2 x float> [[EXP2]]
 ;
   %itofp = sitofp <2 x i8> %x to <2 x float>
   %exp2 = call <2 x float> @llvm.exp2.v2f32(<2 x float> %itofp)
@@ -57,17 +39,11 @@ define <2 x float> @exp2_v2f32_sitofp_v2i8(<2 x i8> %x) {
 }
 
 define float @exp2_f32_uitofp_i8(i8 %x) {
-; LDEXP-LABEL: define float @exp2_f32_uitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT:    [[TMP1:%.*]] = zext i8 [[X]] to i32
-; LDEXP-NEXT:    [[LDEXPF:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT:    ret float [[LDEXPF]]
-;
-; NOLDEXP-LABEL: define float @exp2_f32_uitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT:    [[ITOFP:%.*]] = uitofp i8 [[X]] to float
-; NOLDEXP-NEXT:    [[EXP2:%.*]] = call float @llvm.exp2.f32(float [[ITOFP]])
-; NOLDEXP-NEXT:    ret float [[EXP2]]
+; CHECK-LABEL: define float @exp2_f32_uitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i8 [[X]] to i32
+; CHECK-NEXT:    [[EXP2:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT:    ret float [[EXP2]]
 ;
   %itofp = uitofp i8 %x to float
   %exp2 = call float @llvm.exp2.f32(float %itofp)
@@ -77,8 +53,8 @@ define float @exp2_f32_uitofp_i8(i8 %x) {
 define half @exp2_f16_sitofp_i8(i8 %x) {
 ; CHECK-LABEL: define half @exp2_f16_sitofp_i8(
 ; CHECK-SAME: i8 [[X:%.*]]) {
-; CHECK-NEXT:    [[ITOFP:%.*]] = sitofp i8 [[X]] to half
-; CHECK-NEXT:    [[EXP2:%.*]] = call half @llvm.exp2.f16(half [[ITOFP]])
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT:    [[EXP2:%.*]] = call half @llvm.ldexp.f16.i32(half 0xH3C00, i32 [[TMP1]])
 ; CHECK-NEXT:    ret half [[EXP2]]
 ;
   %itofp = sitofp i8 %x to half
@@ -87,17 +63,11 @@ define half @exp2_f16_sitofp_i8(i8 %x) {
 }
 
 define double @exp2_f64_sitofp_i8(i8 %x) {
-; LDEXP-LABEL: define double @exp2_f64_sitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT:    [[LDEXP:%.*]] = call double @llvm.ldexp.f64.i32(double 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT:    ret double [[LDEXP]]
-;
-; NOLDEXP-LABEL: define double @exp2_f64_sitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT:    [[ITOFP:%.*]] = sitofp i8 [[X]] to double
-; NOLDEXP-NEXT:    [[EXP2:%.*]] = call double @llvm.exp2.f64(double [[ITOFP]])
-; NOLDEXP-NEXT:    ret double [[EXP2]]
+; CHECK-LABEL: define double @exp2_f64_sitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT:    [[EXP2:%.*]] = call double @llvm.ldexp.f64.i32(double 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT:    ret double [[EXP2]]
 ;
   %itofp = sitofp i8 %x to double
   %exp2 = call double @llvm.exp2.f64(double %itofp)
@@ -105,17 +75,11 @@ define double @exp2_f64_sitofp_i8(i8 %x) {
 }
 
 define fp128 @exp2_fp128_sitofp_i8(i8 %x) {
-; LDEXP-LABEL: define fp128 @exp2_fp128_sitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT:    [[LDEXPL:%.*]] = call fp128 @llvm.ldexp.f128.i32(fp128 0xL00000000000000003FFF000000000000, i32 [[TMP1]])
-; LDEXP-NEXT:    ret fp128 [[LDEXPL]]
-;
-; NOLDEXP-LABEL: define fp128 @exp2_fp128_sitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT:    [[ITOFP:%.*]] = sitofp i8 [[X]] to fp128
-; NOLDEXP-NEXT:    [[EXP2:%.*]] = call fp128 @llvm.exp2.f128(fp128 [[ITOFP]])
-; NOLDEXP-NEXT:    ret fp128 [[EXP2]]
+; CHECK-LABEL: define fp128 @exp2_fp128_sitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT:    [[EXP2:%.*]] = call fp128 @llvm.ldexp.f128.i32(fp128 0xL00000000000000003FFF000000000000, i32 [[TMP1]])
+; CHECK-NEXT:    ret fp128 [[EXP2]]
 ;
   %itofp = sitofp i8 %x to fp128
   %exp2 = call fp128 @llvm.exp2.fp128(fp128 %itofp)
@@ -123,17 +87,11 @@ define fp128 @exp2_fp128_sitofp_i8(i8 %x) {
 }
 
 define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(<vscale x 4 x i8> %x) {
-; LDEXP-LABEL: define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(
-; LDEXP-SAME: <vscale x 4 x i8> [[X:%.*]]) {
-; LDEXP-NEXT:    [[TMP1:%.*]] = sext <vscale x 4 x i8> [[X]] to <vscale x 4 x i32>
-; LDEXP-NEXT:    [[EXP2:%.*]] = call <vscale x 4 x float> @llvm.ldexp.nxv4f32.nxv4i32(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[TMP1]])
-; LDEXP-NEXT:    ret <vscale x 4 x float> [[EXP2]]
-;
-; NOLDEXP-LABEL: define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(
-; NOLDEXP-SAME: <vscale x 4 x i8> [[X:%.*]]) {
-; NOLDEXP-NEXT:    [[ITOFP:%.*]] = sitofp <vscale x 4 x i8> [[X]] to <vscale x 4 x float>
-; NOLDEXP-NEXT:    [[EXP2:%.*]] = call <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float> [[ITOFP]])
-; NOLDEXP-NEXT:    ret <vscale x 4 x float> [[EXP2]]
+; CHECK-LABEL: define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(
+; CHECK-SAME: <vscale x 4 x i8> [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sext <vscale x 4 x i8> [[X]] to <vscale x 4 x i32>
+; CHECK-NEXT:    [[EXP2:%.*]] = call <vscale x 4 x float> @llvm.ldexp.nxv4f32.nxv4i32(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x float> [[EXP2]]
 ;
   %itofp = sitofp <vscale x 4 x i8> %x to <vscale x 4 x float>
   %exp2 = call <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float> %itofp)

@efriedma-quic
Copy link
Collaborator

It looks like the lowering for llvm.ldexp currently doesn't respect "no-builtin-ldexp" etc.; we should probably fix that first.

Long-term, we really need a better approach for managing the codesize explosion that comes from expanding everything inline.

@arsenm
Copy link
Contributor Author

arsenm commented May 20, 2024

It looks like the lowering for llvm.ldexp currently doesn't respect "no-builtin-ldexp" etc.; we should probably fix that first.

When I tried to consider no-builtin* for expanding memcpy instead of emitting a call, I got complaints (and had to implement c8cac15) because for some reason there was the expectation of this not changing legalization emitted libcalls. We have parallel tables, TargetLibraryInfo and RuntimeLibcalls, and it's not clear if no-builtin is supposed to change the RuntimeLibcall behavior

@efriedma-quic
Copy link
Collaborator

If we reserve the right to call ldexp anywhere, under any set of flags, how is anyone supposed to implement a C library? I guess this particular combine is unlikely to cause issues, but if you're planning to rip out all checks for whether a functions exists, it seems like something we need to address.

For memcpy/memmove/memset specifically, we currently have a special-case rule: in no-builtins mode: existing calls to the intrinsics lower to the corresponding calls, but optimizations don't generate any new calls to those functions. This is sort of hard to formally justify, but in practice, it works pretty well for people doing stuff with -ffreestanding. The special case doesn't really extend beyond those specific functions.

@arsenm
Copy link
Contributor Author

arsenm commented May 20, 2024

If we reserve the right to call ldexp anywhere, under any set of flags, how is anyone supposed to implement a C library?

This is already not possible, but is a problem to solve in codegen for when the intrinsic implementation is emitted. We need some mechanism to disable RuntimeLibcalls when building a libc.

IMO it was wrong to provide any math intrinsics to expose the usage of math functions. These intrinsics should be providing the target implementation of the libm functions, not the other way around. If we can't codegen the intrinsic without the libcall, we probably shouldn't have the intrinsic.

Treating the intrinsics as aliases to the library calls isn't really useful. We have to maintain parallel optimization paths for the intrinsic and call anyway, then have a lot of machinery to turn the calls into intrinsics, and then the intrinsics back into libcalls.

Alternative to this, I could fix it to preserve libcall->libcall behavior and intrinsic->intrinsic behavior. The main problem I'm trying to solve is gating intrinsic->intrinsic gated on the libcall availability

I guess this particular combine is unlikely to cause issues, but if you're planning to rip out all checks for whether a functions exists, it seems like something we need to address.

For today, I just want to fix ldexp. We need to come up with a RuntimeLibcalls solution eventually (e.g. see the inline asm in llvm-libc to use fma instead of calling the fma intrinsic)

Long-term, we really need a better approach for managing the codesize explosion that comes from expanding everything inline.

We could also undo this optimization in codegen if ldexp isn't available (which in practice just means windows) and call pow

@efriedma-quic
Copy link
Collaborator

IMO it was wrong to provide any math intrinsics to expose the usage of math functions. These intrinsics should be providing the target implementation of the libm functions, not the other way around. If we can't codegen the intrinsic without the libcall, we probably shouldn't have the intrinsic.

I see where you're coming from with this... see also discussion on https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 . If you're willing to push forward something to break apart the current mess, I'd be in favor.

Alternative to this, I could fix it to preserve libcall->libcall behavior and intrinsic->intrinsic behavior. The main problem I'm trying to solve is gating intrinsic->intrinsic gated on the libcall availability

That still technically breaks -fno-builtin-ldexp... but the way it breaks is much less likely to matter, so I think I'm okay with this.

Long-term, we really need a better approach for managing the codesize explosion that comes from expanding everything inline.

We could also undo this optimization in codegen if ldexp isn't available (which in practice just means windows) and call pow

On Windows specifically, we have ldexp; the issue is just that we don't have ldexpf.

@andykaylor
Copy link
Contributor

We definitely have a mess with regard to our handling of math library functions. We're nowhere near consistent behavior.

I'm not sure if this is what Matt was alluding to, but the gcc documentation for fno-builtin-* says, "Don’t recognize built-in functions that do not begin with ‘_builtin’ as prefix." (emphasis added) The result of this is that if you call __builtin_ldexp even when compiling with -fno-builtin-ldexp, we will happily translate it to a call to the llvm.ldexp intrinsic, and on X86 that gets lowered to a call to ldexp() (which may or may not be the library call).

https://godbolt.org/z/95Kb6PjnW

You can make a case that this is user error -- if they didn't want the library function to be called, they shouldn't have used the builtin. It would be much better if we lowered the intrinsics to calls to a compiler runtime library function that we could be certain existed and had control over the accuracy of. This would allow us to actual implement things the way the LLVM IR definition says we will (i.e. not setting errno).

But that's not the current situation, and the example above is different from the front end replacing one call with another on the user's behalf. If the user has passed -fno-builtin-ldexp and then called exp2((double)n) it seems like a very good bet that they really don't want the ldexp library function to be called.

@efriedma-quic
Copy link
Collaborator

IMO it was wrong to provide any math intrinsics to expose the usage of math functions. These intrinsics should be providing the target implementation of the libm functions, not the other way around. If we can't codegen the intrinsic without the libcall, we probably shouldn't have the intrinsic.

I see where you're coming from with this... see also discussion on https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 . If you're willing to push forward something to break apart the current mess, I'd be in favor.

Maybe the right thing to do here is just use TTI: we ask the target if it can handle the intrinsic, it says yes if either the libcall is legal, or it has some other implementation it can use.

When folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp libcall
to emit the intrinsic.

The intrinsic needs to be handled regardless of whether the system
has a libcall, and we have an inline implementation of ldexp already.
This fixes the instance in the exp2->ldexp fold. Another instance exists
for the pow(2) -> ldexp case

The LTO test change isn't ideal, since it's just moving the problem to
another instance where we're relying on implied libm behavior for an
intrinsic transform. Use exp10 since that's a harder case to solve in
the libcall house of cards we have.
@arsenm
Copy link
Contributor Author

arsenm commented Jun 7, 2024

On Windows specifically, we have ldexp; the issue is just that we don't have ldexpf.

Can we handle this by just promoting float to double in codegen?

@arsenm
Copy link
Contributor Author

arsenm commented Jun 7, 2024

and on X86 that gets lowered to a call to ldexp() (which may or may not be the library call).

Assuming the runtime libcalls are set up accuratly, this should hit the inline expansion we get today and nothing will break. We do not just unconditionally lower the intrinsic to the libcall, although the control-flow free expansion might not be what you want

@arsenm arsenm force-pushed the simplifylibcalls-no-require-ldexp-libcall-exp2 branch from 43b490e to 1b4e6be Compare June 7, 2024 16:14
@efriedma-quic
Copy link
Collaborator

On Windows specifically, we have ldexp; the issue is just that we don't have ldexpf.

Can we handle this by just promoting float to double in codegen?

Yes, that's fine; the Windows math.h does exactly the same thing.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arsenm arsenm merged commit 8788b66 into llvm:main Jun 10, 2024
5 of 6 checks passed
@arsenm arsenm deleted the simplifylibcalls-no-require-ldexp-libcall-exp2 branch June 10, 2024 16:21
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Jun 12, 2024
…ne (llvm#92707)

When folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp
libcall to emit the intrinsic.

The intrinsic needs to be handled regardless of whether the system has a
libcall, and we have an inline implementation of ldexp already. This
fixes the instance in the exp2->ldexp fold. Another instance exists for
the pow(2) -> ldexp case

The LTO test change isn't ideal, since it's just moving the problem to
another instance where we're relying on implied libm behavior for an
intrinsic transform. Use exp10 since that's a harder case to solve in
the libcall house of cards we have.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
floating-point Floating-point math llvm:instcombine llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants