Skip to content

[AutoBump] Merge with 776476c2 (Nov 22) (13) #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 542 commits into from
Mar 17, 2025

Conversation

jorickert
Copy link

No description provided.

arsenm and others added 30 commits November 22, 2024 20:12

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llvm#117260)

This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
A few instructions changed rate.
…nge (llvm#117263)

These have an additional wait state compared to gfx940.
…or gfx950 (llvm#117283)

Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17
wait states.
…d change for gfx950 (llvm#117284)

Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.
…Data (llvm#117412)

This patch removes two functions to verify the consistency between:

- IndexedAllocationInfo::CallStack
- IndexedAllocationInfo::CSId

Now that MemProf format Version 1 has been removed,
IndexedAllocationInfo::CallStack doesn't participate in either
serialization or deserialization, so we don't care about the
consistency between the two fields in IndexAllocationInfo.

Subsequent patches will remove uses of the old field and eventually
remove the field.
…lvm#117076)

This PR allows mixing `-basic-block-sections` with
`-enable-machine-function-splitter`. The strategy is to let
`-basic-block-sections` take precedence over functions with profiles.
llvm#116955)

…ve offset

In Exact mode, the approximation of returning (0,0) is invalid. It only
holds in min/max mode.
… on the RHS

This helps ensure we lower to VPERMI2/T2 instructions that we can commute the index arg to VPERMT2/I2.

Prep work for llvm#79799
Use all_of instead of explicit loop to reduce indentation, also properly
check VPScalarCastRecipe operand.
…rsion (llvm#117413)

This commit adds support for 1:N result type conversions for `func.call`
ops. In that case, argument materializations to the original result type
should be inserted (via `replaceOpWithMultiple`).

This commit is in preparation of merging the 1:1 and 1:N conversion
drivers.
This PR updates the minimal required version of pybind11 from 2.9.0 to
2.10.0. New new version is almost 2.5 years old, which is half a year
less than the previous version. This change is necessary to support the
changes introduced in llvm#115307, which does not compile with pybind11
v.2.9.

Signed-off-by: Ingo Müller <ingomueller@google.com>
This location type represents a contiguous range inside a file. It is
effectively a pair of FileLineCols. Add new type and make FileLineCol a
view for case where it matches existing previous one.

The location includes filename and optional start line & col, and end
line & col. Considered common cases are file:line, file:line:col,
file:line:start_col to file:line:end_col and general range within same
file. In memory its encoded as trailing objects. This keeps the memory
requirement the same as FileLineColLoc today (makes the rather common
File:Line cheaper) at the expense of extra work at decoding time. Kept the unsigned
type.

There was the option to always have file range be castable to
FileLineColLoc. This cast would just drop other fields. That may result
in some simpler staging. TBD.

This is a rather minimal change, it does not yet add bindings (C or
Python), lowering to LLVM debug locations etc. that supports end line:cols.

---------

Co-authored-by: River Riddle <riddleriver@gmail.com>
…on the RHS

This helps ensure we lower to VPERMI2/T2 instructions that we can commute the index arg to VPERMT2/I2.

Similar to 1e31a45 to handle cases where the one use load appears after further folding (keep the lowerShuffleWithPERMV version as this can handle the non-VLX widening case as well).
…7325)

Two PRs were merged at the same time: one that modified `maybeApplyToV`
function, and shortly afterwards, this (the reverted) one that had the
old definition.

During the merge both definitions were retained leading to compilation
errors.

Reapply the reverted PR (1a08b15) with the duplicate removed.
This converts all ptr element shuffle vectors to s64, so that the
existing vector legalization handling can lower them as needed. This
prevents a lot of fallbacks that currently try to generate things like
`<2 x ptr> G_EXT`.

I'm not sure if bitcast/inttoptr/ptrtoint is intended to be necessary
for vectors of pointers, but it uses buildCast for the casts, which now
generates a ptrtoint/inttoptr.
It does not make sense to assemble for the default target.
Add one that shows the behavior. It is treated as a tahiti
alias without instructions which were later removed, and needs
to be treated as wave64. We should probably turn this into a hard
error though.
You cannot adjust the disassembler's subtarget. llvm-mc passes
the originally constructed MCSubtargetInfo around, rather than
querying the pointer in the disassembler instance.
This is a refinement for the existing hack. With this,
the default target will have neither wavefrontsize feature
present, unless it was explicitly specified. That is,
getWavefrontSize() == 64 no longer implies +wavefrontsize64.
getWavefrontSize() == 32 does imply +wavefrontsize32.

Continue to assume the value is 64 with no wavesize feature.
This maintains the codegenable property without any code
that directly cares about the wavesize needing to worry about it.

Introduce an isWaveSizeKnown helper to check if we know the
wavesize is accurate based on having one of the features explicitly
set, or a known target-cpu.

I'm not sure what's going on in wave_any.s. It's testing what
happens when both wavesizes are enabled, but this is treated
as an error in codegen. We now treat wave32 as the winning
case, so some cases that were previously printed as vcc are now
vcc_lo.
…rs (llvm#114148)

Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is
used to implement the `IncrementCounter` and `DecrementCounter` methods
on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see
Note).

The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter`
or `llvm.spv.bufferUpdateCounter`.

Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource`
that enables adding methods to builtin types using builder pattern like
this:
```
   BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType)
       .addParam("param_name", Type, InOutModifier)
       .callBuiltin("buildin_name", { BuiltinParams })
       .finalizeMethod();
```

Fixes llvm#113513
Regenerate the failing test as well.
This patch fixes:

  clang/lib/Sema/SemaHLSL.cpp:2225:32: error: absolute value function
  'abs' given an argument of type 'int64_t' (aka 'long') but has
  parameter of type 'int' which may cause truncation of value
  [-Werror,-Wabsolute-value]
…Pass (llvm#117405)

This change removes the restriction on `SymbolUserOpInterface` operators
so they can be used with operators that implement `SymbolOpInterface`,
example:

`memref.global` implements `SymbolOpInterface` so it can be used with
`memref.get_global` which implements `SymbolUserOpInterface`

```
// Define a global constant array
memref.global "private" constant @global_array : memref<10xi32> = dense<[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]> : tensor<10xi32>

// Access this global constant within a function
func @use_global() {
  %0 = memref.get_global @global_array : memref<10xi32>
}
```

Reference: llvm#116519 and
https://discourse.llvm.org/t/question-on-criteria-for-acceptable-ir-in-removedeadvaluespass/83131

---------

Co-authored-by: Zeeshan Siddiqui <mzs@ntdev.microsoft.com>
arsenm and others added 24 commits November 25, 2024 19:23
…0. (llvm#117591)

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
…llvm#117593)

OPSEL[0] selects src_word to read.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
…m#117594)

These instructions have non-standard use of OPSEL bits to select
dest write byte. The src2_modifiers operand is used without having
its corresponding src2 operand by introducing dummy src2.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
…117595)

Scale packed 16-component single-precision float vectors from
two  source inputs using the exponent provided by the third
single-precision float input, then convert the values to a packed
32-component FP6 float value.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
…lvm#117596)

This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
…117597)

v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.

This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16).

All necessary changes to gfx11 and gfx12 are updated to reflect this change.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
…#117598)

The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
…vm#117631)

- Update the runtime entry points to accept a stream information
- Update the conversion of `cuf.allocate` to pass correctly the stream
information when present.

Note that the stream is not currently used in the runtime. This will be
done in a separate patch as a design/solution needs to be down together
with the allocators.
Co-authored-by: Quentin Chateau <quentin.chateau@gmail.com>
…etConstant. (llvm#117639)

Fix all the places I could find that did't do this. We were already
mostly correct for FP_ROUND after
9a976f3, but not STRICT_FP_ROUND.
In case of an error GetBlock would return a reference to a Block without
adding it to a parent. This doesn't seem like a good idea, and none of
the other plugins do that.

This patch fixes that by propagating errors (well, null pointers...) up
the stack.

I don't know of any specific problem that this solves, but given that
this occurs only when something goes very wrong (e.g. a corrupted PDB
file), it's quite possible noone has run into this situation, so we
can't say the code is correct either. It also gets in the way of a
refactor I'm contemplating.
When dealing with cpu_specific GlobalDecl,
GetOrCreateMultiVersionResolver should immediately return the already
created llvm function if it exists.

Fixes llvm#115299.
[AutoBump] Merge with fixes of beff2ba (Nov 22) (14)
@jorickert jorickert requested a review from mgehre-amd March 14, 2025 06:59
Base automatically changed from bump_to_cbc78022 to feature/fused-ops March 14, 2025 16:12
[AutoBump] Merge with 56eb559 (Nov 26) (15)
@jorickert jorickert merged commit bb0d9a5 into feature/fused-ops Mar 17, 2025
10 of 11 checks passed
@jorickert jorickert deleted the bump_to_776476c2 branch March 17, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet