Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopVectorize] LLVM fails to vectorise loops with multi-exit #92633

Open
vfdff opened this issue May 18, 2024 · 4 comments
Open

[LoopVectorize] LLVM fails to vectorise loops with multi-exit #92633

vfdff opened this issue May 18, 2024 · 4 comments

Comments

@vfdff
Copy link
Contributor

vfdff commented May 18, 2024

  • Since the RFC, the process is slow,
    so use a simple case to track the result, https://gcc.godbolt.org/z/r6KGKMYMd

    the case from above RFC, which is expect a base case, both gcc and llvm don't support it now.

int foo(float *a, int n, int M){
  int i;
  for (i=0; i<n; i++){
    if (a[i] == 0) break;
  }
  return i;
}
  • kernel body of loop: clang -march=armv8.2-a+sve -O3
.LBB0_2:                                // =>This Inner Loop Header: Depth=1
        ldr     s0, [x8, x0, lsl #2]
        fcmp    s0, #0.0
        b.eq    .LBB0_5
        add     x0, x0, #1
        cmp     x9, x0
        b.ne    .LBB0_2
@vfdff
Copy link
Contributor Author

vfdff commented May 18, 2024

this case is the 1st scenarios mentioned in above RFC

int foo(float *a, int n, int M){
  int i;
  for (i=0; i<n; i++){
    if (i > M) break;
    a[i] = i;
  }
  return i;
}

@vfdff
Copy link
Contributor Author

vfdff commented May 18, 2024

static float sa[100];
int foo(float *a, int n, int M){
  int i;
  for (i=0; i<n; i++){
    if (sa[i] == 0) return 0;
  }
  return 1;
}

llvm just do a scalar loop:

.LBB0_2:                                // =>This Inner Loop Header: Depth=1
        ldr     s0, [x9]
        fcmp    s0, #0.0
        b.eq    .LBB0_5
        subs    x8, x8, #1
        add     x9, x9, #4
        b.ne    .LBB0_2

gcc generates sve: gcc -march=armv8.2-a+sve -O3

.L4:
        add     x0, x0, x3
        whilelo p7.s, w0, w1
        b.none  .L7
.L5:
        ld1w    z29.s, p7/z, [x2, x0, lsl 2]
        mov     z31.d, z30.d
        fcmeq   p7.s, p7/z, z29.s, #0.0
        incw    z30.s
        ptest   p15, p7.b
        b.none  .L4

@pinskia
Copy link

pinskia commented May 19, 2024

For the first testcase, that will require what is called first fault support (which I know is aimed to get into GCC 15; I don't know about LLVM though) and as far as I know can only be vectorized using SVE (and not the normal advanced SIMD support in ARMv8-a).

@vfdff
Copy link
Contributor Author

vfdff commented May 20, 2024

Thanks @pinskia for your reminder, I find llvm don't really support the multi-exit case.
For the 1st case, it use a csel to update the loop trip count, so it seems a loop version

csel w9, w2, w9, lo // w9 = min(M, n-1)

int foo(float *a, int n, int M){
  int i;
  if (M < n-1) {
    for (i=0; i<M; i++){
      a[i] = i;
    }
   return i;
  }
  for (i=0; i<n; i++){
    a[i] = i;
  }
  return i;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants