-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][WebAssembly] Odd builtin behaviour #92698
Comments
@llvm/issue-subscribers-backend-webassembly Author: Michel Rouzic (Photosounder)
Using clang (version 18.1.4) to compile WebAssembly and going the `-nostdlib` route I'm noticing strange things when I try to use some builtins to implement libc functions. For instance doing `static double sqrt(double x) { return __builtin_sqrt(x); }` works fine but doing `static double exp2(double x) { return __builtin_exp2(x); }` gives me `wasm-ld: error: C:/msys/tmp/rl-89dcca.o: undefined symbol: exp2` and making it non-static with an `extern` prototype above itself just makes it call itself in a loop. That's even though `__has_builtin(__builtin_exp2)` is positive. Same thing with `static double cos(double x) { return __builtin_cos(x); }`.
It seems as though some builtins aren't really there for WebAssembly, as if they were defined by |
What are you using for standard library headers, emscripten? |
I'm using |
These builtins lower to calls into compiler-rt functions. In wasi-sdk they would normally be provided by libc/musl code. e.g. https://github.com/WebAssembly/wasi-libc/blob/main/libc-top-half/musl/src/math/exp2.c If you want to build with |
I understand, that's already what I've been doing (see https://github.com/Photosounder/MinQND-libc/blob/main/minqnd_libc.h, I was actually wondering if I really needed all these homemade implementations instead of relying on builtins), the problem is not having an easy way of knowing what's actually available or not. There's |
I'm afraid I don't know where the LLVM source code you can look for find a complete list the libcalls that the Wasm backend depends on. You probably want to be looking for terms like @dschuff do you know if there is an easy way to tell exactly which libcalls can be generated by llvm? The conservative thing to do would be to provide a complete set of libcalls which is what libc/compiler-rt would do, but I guess you are trying to make something more minimal. Is there some reason you can't link against the math functions from musl/compiler-rt? |
A list of all libcalls LLVM knows about can be found in https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/RuntimeLibcalls.def |
It doesn't seem like this highlights the distinction I'm looking for, for instance we have
The first is available without libc (I assume because it simply turns into
Tbh I'm kind of too far ahead into implementing all the functions I need for it to be a big deal, but it feels like a defect of the compiler to not make a distinction between what is or isn't actually available (because the vanilla x86-64 clang I'm using doesn't have a libc for wasm32 no matter what). As for why I'm doing this I have a whole manifesto about this which I don't think I should bore you with 😄. |
That's still kind of a conservative estimate though; e.g. we have a signature for RTLIB::MUL_F32 but I can't imagine any case where that would actually be generated because wasm has a 32-bit multiply instruction. I don't know offhand of a way to easily tell which operations are supported. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp#L46 sets a bunch of cases of how to handle different operations and types but it's also not really a comprehensive list. |
So all the builtins are there but there's no distinction between the ones that rely on libc and the ones that work without libc. If you see |
It seems like it might still not show the right distinction as I see in this file |
Yeah; to be clear I'm agreeing that this doesn't really seem to be exactly what you want. I unfortunately don't know of a single place that provides such a list. Regarding your libc philosophy, emscripten actually shares some of those values. The reasoning mostly boils down to the fact that we place a higher priority on code size than most C implementations, because on the web today you need to ship all of your libc code over the wire with your program, whereas in most C implementations, libc is already on every system from install time, so code size isn't as much of a constraint. As a result we try harder than most C implementations to get separability of library functions so that only things which are actually needed can be included (so there are a few dependencies we try to break inside libc, e.g. our use of printf without long double support by default) but we have to balance that against the maintenance cost of keeping local modifications of upstream libc (and of course we have decided not to roll our own libc but use musl instead). Probably one of the biggest reasons for that is that we can't compromise on standards compliance or accuracy by default as much as you can when you're creating a library mostly for your own use or for a special purpose. |
That's good, I didn't try to compile with emscripten to see how big the result is (nor did I think about how what I do is advantageous for the web), but that makes sense. Yes by making my own libc just for me I can go to further extremes in terms of simplicity, minimalism and mathematical tradeoffs, in fact it makes a lot of sense for someone to have their own libc if you take such considerations into account. Having my own allocator is also quite important as I can use the external visualiser I made for it to see what's happening in memory and I even have the option of doing things that aren't normally possible, such as moving the start of an allocated buffer up without moving any data. As far as my original problem goes I suppose we could figure out which builtins are available without libc by making a C file with just one function that calls them all and see what the linker says about this, but it's odd that there would be no way of more directly determining this. |
Don't forget that most of the libcalls we are talking about here that differ across platforms in terms of which ones you need (e.g. MUL_F32) are from compiler-rt and are not actually part of libc. Unlike those that are actually part of libc, the compiler-rt functions tend to be well-separable and independent, and if you have a function in your library but don't actually emit calls to it, the linker will ensure it doesn't get included in any binary. So you get the size and simplicity of the linked program without any effort (and most people don't want to write these from scratch and don't have a problem with just taking the compiler-rt builtins library as a whole and just including it all, because there's no cost to the user). |
I did a bit of research by making a C file that calls lots of builtins and commented out everything that couldn't be made to work: // Compile with: \
clang -mexec-model=reactor builtins_test.c -o builtins_test.wasm --target=wasm32 -nostdlib -mbulk-memory
extern void __wasm_call_ctors(void);
__attribute__((export_name("_initialize"))) void _initialize(void) { __wasm_call_ctors(); }
typedef struct { int a, b; } struct_t;
int dummy_printf(const char *fmt, ...) { return 0; }
__attribute__((export_name("builtins_test"))) int builtins_test()
{
double d = 1.;
int r, i = 8;
char s[] = "";
__float128 v;
void *p = 0;
__builtin_va_list args;
struct_t st;
//undef r = __builtin_acosf128(d);
//undef r = __builtin_acoshf128(d);
//undef r = __builtin_asinf128(d);
//undef r = __builtin_asinhf128(d);
//undef r = __builtin_atanf128(d);
//undef r = __builtin_atanhf128(d);
//undef r = __builtin_cbrtf128(d);
r = __builtin_ceil(d);
//undef r = __builtin_cos(d);
//undef r = __builtin_coshf128(d);
//undef r = __builtin_erff128(d);
//undef r = __builtin_erfcf128(d);
//undef r = __builtin_exp(d);
//undef r = __builtin_exp2(d);
//undef r = __builtin_exp10(d);
//undef r = __builtin_expm1f128(d);
//undef r = __builtin_fdimf128(d, d);
r = __builtin_floor(d);
//undef r = __builtin_fma(d, d, d);
//undef r = __builtin_fmax(d, d);
//undef r = __builtin_fmin(d, d);
//undef r = __builtin_atan2f128(d, d);
//r = __builtin_copysignf16(d, d); crashes compiler
// r = __builtin_copysignf128(d, d); needs other symbols
//r = __builtin_fabsf16(d);
// r = __builtin_fabsf128(d); needs other symbols
//undef r = __builtin_fmod(d, d);
//undef r = __builtin_frexp(d, &i);
r = __builtin_huge_val();
//r = __builtin_huge_valf16();
r = __builtin_inf();
//r = __builtin_inff16();
//undef r = __builtin_ldexp(d, d);
//undef r = __builtin_modff128(d, &v);
//r = __builtin_nanf16(s);
//undef r = __builtin_nanf128(s);
//undef r = __builtin_nans(s);
//undef r = __builtin_powi(d, d);
//undef r = __builtin_pow(d, d);
//undef r = __builtin_hypotf128(d, d);
//undef r = __builtin_ilogbf128(d);
//undef r = __builtin_lgammaf128(d);
//undef r = __builtin_llrintf128(d);
//undef r = __builtin_llroundf128(d);
//undef r = __builtin_log10(d);
//undef r = __builtin_log1pf128(d);
//undef r = __builtin_log2(d);
//undef r = __builtin_logbf128(d);
//undef r = __builtin_log(d);
//undef r = __builtin_lrintf128(d);
//undef r = __builtin_lroundf128(d);
//undef r = __builtin_nearbyintf128(d);
//undef r = __builtin_nextafterf128(d, d);
//undef r = __builtin_nexttowardf128(d, d);
//undef r = __builtin_remainderf128(d, d);
//undef r = __builtin_remquof128(d, d, &i);
r = __builtin_rint(d);
//undef r = __builtin_round(d);
r = __builtin_roundeven(d);
//undef r = __builtin_scalblnf128(d, d);
//undef r = __builtin_scalbnf128(d, d);
//undef r = __builtin_sin(d);
//undef r = __builtin_sinhf128(d);
r = __builtin_sqrt(d);
//undef r = __builtin_tanf128(d);
//undef r = __builtin_tanhf128(d);
//undef r = __builtin_tgammaf128(d);
r = __builtin_trunc(d);
r = __builtin_flt_rounds();
r = __builtin_complex(d, d);
r = __builtin_isgreater(d, d);
r = __builtin_isgreaterequal(d, d);
r = __builtin_isless(d, d);
r = __builtin_islessequal(d, d);
r = __builtin_islessgreater(d, d);
r = __builtin_isunordered(d, d);
r = __builtin_fpclassify(d, d, d, d, d, d);
r = __builtin_isfinite(d);
r = __builtin_isinf(d);
r = __builtin_isinf_sign(d);
r = __builtin_isnan(d);
r = __builtin_isnormal(d);
r = __builtin_issubnormal(d);
r = __builtin_iszero(d);
r = __builtin_issignaling(d);
r = __builtin_isfpclass(d, 0);
r = __builtin_signbit(d);
r = __builtin_signbitf(d);
// r = __builtin_signbitl(d); needs extra symbol
//r = __builtin_canonicalize(d); crashes compiler
r = __builtin_clz(d);
r = __builtin_ctz(d);
r = __builtin_ffs(d);
r = __builtin_parity(d);
r = __builtin_popcount(d);
r = __builtin_clrsb(d);
//undef p = __builtin_calloc(i, i);
r = __builtin_constant_p(d);
r = __builtin_classify_type(d);
//r = __builtin_va_start(args, d);
//r = __builtin_stdarg_start(args, d);
p = __builtin_assume_aligned(p, 4);
//undef __builtin_free(p);
//undef p = __builtin_malloc(i);
__builtin_memcpy_inline(p, p, 0);
p = __builtin_mempcpy(p, p, d);
__builtin_memset_inline(p, d, 0);
//undef r = __builtin_strcspn(s, s);
//undef p = __builtin_realloc(p, i);
//p = __builtin_return_address(0); doesn't like it
p = __builtin_extract_return_addr(p);
p = __builtin_frame_address(0);
//__builtin___clear_cache(s, s); crashes compiler
__builtin_unwind_init();
r = __builtin_eh_return_data_regno(0);
//p = __builtin_thread_pointer(); crashes compiler
p = __builtin_launder(s);
//__builtin_eh_return(d, p); available but suppresses linker errors
p = __builtin_frob_return_addr(p);
p = __builtin_dwarf_cfa();
/*__builtin_init_dwarf_reg_size_table(p); cannot compile
//r = __builtin_dwarf_sp_column();*/
r = __builtin_extend_pointer(p);
r = __builtin_object_size(p, 2);
r = __builtin_dynamic_object_size(p, 2);
//undef p = __builtin___memcpy_chk(p, p, d, d);
//undef p = __builtin___memccpy_chk(p, p, d, d, d);
//undef p = __builtin___memmove_chk(p, p, d, d);
//undef p = __builtin___mempcpy_chk(p, p, d, d);
//undef p = __builtin___memset_chk(p, d, d, d);
//undef p = __builtin___stpcpy_chk(s, s, d);
//undef p = __builtin___strcat_chk(s, s, d);
//undef p = __builtin___strcpy_chk(s, s, d);
//undef r = __builtin___strlcat_chk(s, s, d, d);
//undef r = __builtin___strlcpy_chk(s, s, d, d);
//undef p = __builtin___strncat_chk(s, s, d, d);
//undef p = __builtin___strncpy_chk(s, s, d, d);
//undef p = __builtin___stpncpy_chk(s, s, d, d);
//undef r = __builtin___snprintf_chk(s, i, i, i, "");
//undef r = __builtin___sprintf_chk(s, i, i, "");
//undef r = __builtin___vsnprintf_chk(s, i, i, i, "%s", s);
//undef r = __builtin___vsprintf_chk(s, i, i, "%s", s);
//undef r = __builtin___printf_chk(i, s, d, d);
//undef r = __builtin___vprintf_chk(i, s, args);
r = __builtin_unpredictable(d);
r = __builtin_expect(d, d);
r = __builtin_expect_with_probability(d, d, 1.);
__builtin_prefetch(p);
r = __builtin_readcyclecounter();
__builtin_trap();
__builtin_debugtrap();
//__builtin_unreachable(); disabled for obvious reasons
//r = __builtin_shufflevector(d, d); idk how to use those
//r = __builtin_convertvector(d, d);
p = __builtin_alloca_uninitialized(d);
p = __builtin_alloca_with_align(d, 8);
p = __builtin_alloca_with_align_uninitialized(d, 8);
//r = __builtin_call_with_static_chain(d, d); idk how to use this either
r = __builtin_nondeterministic_value(d);
r = __builtin_elementwise_abs(d);
r = __builtin_elementwise_bitreverse(i);
//undef r = __builtin_elementwise_max(d, d);
//undef r = __builtin_elementwise_min(d, d);
r = __builtin_elementwise_ceil(d);
//undef r = __builtin_elementwise_cos(d);
//undef r = __builtin_elementwise_exp(d);
//undef r = __builtin_elementwise_exp2(d);
r = __builtin_elementwise_floor(d);
//undef r = __builtin_elementwise_log(d);
//undef r = __builtin_elementwise_log2(d);
//undef r = __builtin_elementwise_log10(d);
//undef r = __builtin_elementwise_pow(d, d);
r = __builtin_elementwise_roundeven(d);
//undef r = __builtin_elementwise_round(d);
r = __builtin_elementwise_rint(d);
r = __builtin_elementwise_nearbyint(d);
//undef r = __builtin_elementwise_sin(d);
r = __builtin_elementwise_sqrt(d);
//r = __builtin_elementwise_tan(d); unknown
r = __builtin_elementwise_trunc(d);
//r = __builtin_elementwise_canonicalize(d); crashes compiler
r = __builtin_elementwise_copysign(d, d);
//undef r = __builtin_elementwise_fma(d, d, d);
r = __builtin_elementwise_add_sat(i, i);
r = __builtin_elementwise_sub_sat(i, i);
//r = __builtin_reduce_max(d); idk
//r = __builtin_reduce_min(d); idk
/*r = __builtin_reduce_xor(i);
r = __builtin_reduce_or(i); idk what's a vector of integers
r = __builtin_reduce_and(i);
r = __builtin_reduce_add(i);
r = __builtin_reduce_mul(i);
r = __builtin_matrix_transpose(d); nor a matrix
r = __builtin_matrix_column_major_load(d);
r = __builtin_matrix_column_major_store(d);*/
//undef r = __builtin_memcmp(p, p, i);
//undef r = __builtin_printf("%%");
//undef r = __builtin_bcmp(p, p, d);
//p = __builtin_objc_memmove_collectable(p, p, d); crashes compiler
r = __builtin_annotation(i, "%%");
__builtin_assume(d);
__builtin_assume_separate_storage(p, p);
r = __builtin_addc(d, d, d, p);
r = __builtin_subc(d, d, d, p);
r = __builtin_add_overflow(i, i, &i);
r = __builtin_sub_overflow(i, i, &i);
r = __builtin_mul_overflow(i, i, &i);
/*r = __builtin_uadd(d); unknown
r = __builtin_usub(d);
r = __builtin_umul(d);
r = __builtin_sadd(d);
r = __builtin_ssub(d);
r = __builtin_smul(d);*/
p = __builtin_addressof(d);
p = __builtin_function_start(builtins_test);
//undef p = __builtin_char_memchr(s, i, i);
__builtin_dump_struct(&st, dummy_printf);
//r = __builtin_preserve_access_index(d); needs -g
r = __builtin_is_aligned(p, i);
p = __builtin_align_up(p, i);
p = __builtin_align_down(p, i);
//undef p = __builtin___get_unsafe_stack_start();
//undef p = __builtin___get_unsafe_stack_bottom();
//undef p = __builtin___get_unsafe_stack_top();
//undef p = __builtin___get_unsafe_stack_ptr();
__builtin_nontemporal_store(d, &i);
r = __builtin_nontemporal_load(&d);
/*r = __builtin_store_half(d); unknown
r = __builtin_load_half(d);*/
return r;
} Turns out that FMA isn't actually available after all. |
I think FMA might only be available if you use |
Edit: Here's my |
Long story short is that autogeneration of FMA is not available with relaxed SIMD enabled, it uses Wasm-specific builtins that are not mapped to generic FMA, plus there is no scalar FMA altogether. @Photosounder I like you implementation, I've started to play with a minimal way to add malloc at some point, but never really finished it :) |
Using clang (version 18.1.4) to compile WebAssembly and going the
-nostdlib
route I'm noticing strange things when I try to use some builtins to implement libc functions. For instance doingstatic double sqrt(double x) { return __builtin_sqrt(x); }
works fine but doingstatic double exp2(double x) { return __builtin_exp2(x); }
gives mewasm-ld: error: C:/msys/tmp/rl-89dcca.o: undefined symbol: exp2
and making it non-static with anextern
prototype above itself just makes it call itself in a loop. That's even though__has_builtin(__builtin_exp2)
is positive. Same thing with__builtin_cos
or__builtin_lroundf
(yet not__builtin_nearbyintf
).It seems as though some builtins aren't really there for WebAssembly, as if they were defined by
#define __builtin_cos cos
, but there doesn't seem to be a clear way to determine which are actually usable, and knowing which I can actually use is what I would need.Edit: I thought about it and I guess the answer is that only the builtins that map to a WebAssembly opcode are there in my case as the rest would come from libc which I'm not including so there's nothing there. It would be nice if this was documented somewhere though.
The text was updated successfully, but these errors were encountered: