Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang generates worse code than GCC for a simple case #92649

Open
KanRobert opened this issue May 18, 2024 · 4 comments
Open

Clang generates worse code than GCC for a simple case #92649

KanRobert opened this issue May 18, 2024 · 4 comments

Comments

@KanRobert
Copy link
Contributor

int f(int *a) {
  if (*a & 1234)
    return 0;
  return 1;
}
bash$ gcc -O2 -S 1.c -o -
        .file   "1.c"
        .text
        .p2align 4
        .globl  f
        .type   f, @function
f:
.LFB0:
        .cfi_startproc
        xorl    %eax, %eax
        testl   $1234, (%rdi)
        sete    %al
        ret
bash$ clang -O2 -S 1.c -o -
        .text
        .file   "1.c"
        .globl  f                               # -- Begin function f
        .p2align        4, 0x90
        .type   f,@function
f:                                      # @f
        .cfi_startproc
# %bb.0:                                # %entry
        movzwl  (%rdi), %ecx
        xorl    %eax, %eax
        testl   $1234, %ecx                     # imm = 0x4D2
        sete    %al
        retq

https://www.godbolt.org/z/he4cj4a8G

@llvmbot
Copy link
Collaborator

llvmbot commented May 18, 2024

@llvm/issue-subscribers-backend-x86

Author: Shengchen Kan (KanRobert)

``` int f(int *a) { if (*a & 1234) return 0; return 1; } ```
bash$ gcc -O2 -S 1.c -o -
        .file   "1.c"
        .text
        .p2align 4
        .globl  f
        .type   f, @<!-- -->function
f:
.LFB0:
        .cfi_startproc
        xorl    %eax, %eax
        testl   $1234, (%rdi)
        sete    %al
        ret
bash$ clang -O2 -S 1.c -o -
        .text
        .file   "1.c"
        .globl  f                               # -- Begin function f
        .p2align        4, 0x90
        .type   f,@<!-- -->function
f:                                      # @<!-- -->f
        .cfi_startproc
# %bb.0:                                # %entry
        movzwl  (%rdi), %ecx
        xorl    %eax, %eax
        testl   $1234, %ecx                     # imm = 0x4D2
        sete    %al
        retq

https://www.godbolt.org/z/he4cj4a8G

@KanRobert
Copy link
Contributor Author

CC @phoebewang @RKSimon @topperc b/c I'm not if it's by design.

@phoebewang
Copy link
Contributor

Maybe similar to #92251

@topperc
Copy link
Collaborator

topperc commented May 18, 2024

I don't think this is intentional. It looks like TargetLowering::SimplifySetCC is narrowing the setcc+and+load to i16 here.

    // If the LHS is '(and load, const)', the RHS is 0, the test is for          
    // equality or unsigned, and all 1 bits of the const are in the same         
    // partial word, see if we can shorten the load.                             
    if (DCI.isBeforeLegalize() &&                                                
        !ISD::isSignedIntSetCC(Cond) &&                                          
        N0.getOpcode() == ISD::AND && C1 == 0 &&                                 
        N0.getNode()->hasOneUse() &&                                             
        isa<LoadSDNode>(N0.getOperand(0)) &&                                     
        N0.getOperand(0).getNode()->hasOneUse() &&                               
        isa<ConstantSDNode>(N0.getOperand(1))) {                                 

The and gets promoted to i32 later due to isTypeDesirableForOp. That creates an anyext load which later becomes a zextload.

We could probably add a DAGCombine to promote the load back to i32 for the AND based on alignment.

@KanRobert KanRobert self-assigned this May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants