SuccessChanges

Changes from Git (git http://labmaster3.local/git/llvm-project.git)

Summary

  1. [AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences. (details)
  2. [mlir] Add exp2 conversion to llvm.intr.exp2 (details)
  3. [X86] X86CallFrameOptimization - generalize slow push code path (details)
  4. [PostOrderIterator] Use SmallVector to store stack; NFC (details)
  5. [VPlan] Use one VPWidenRecipe per original IR instruction. (NFC). (details)
  6. [mlir] NFC: fix trivial typo in documents (details)
  7. [X86][AVX] Add X86ISD::VALIGN target shuffle decode support (details)
  8. [X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC. (details)
  9. [X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720) (details)
  10. [InstCombine] Use replaceOperand() in a few more places (details)
  11. [InstCombine] Erase original add when creating saddo (details)
  12. [InstCombine] Fix worklist management in varargs transform (details)
  13. [OpenMP] set_bits iterator yields unsigned elements, no reference (NFC). (details)
  14. [InstCombine] Simplify select of cmpxchg transform (details)
  15. Remove unnecessary empty comments from test check lines. NFC. (details)
  16. [X86][AVX] Add tests for 512-bit shuffle patterns that could reduce to subvector extractions (details)
  17. [InstCombine] make test independent of branch undef/UB; NFC (details)
  18. [VectorCombine] skip debug intrinsics first for efficiency (details)
  19. AMDGPU: Fix typo (details)
  20. AMDGPU: Add some additional tests for v_cvt_ubyte* formation (details)
  21. AMDGPU: Fix using wrong instruction for FP conversion (details)
  22. AMDGPU/GlobalISel: Remove redundant virtual (details)
  23. GlobalISel: Add matcher for G_SHL (details)
  24. Introduce support for lib function aligned_alloc in TLI / memory builtins (details)
Commit 4bf015c035e4e5b63c7222dfb15ff274a5ed905c by wichard
[AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences.

Summary:
On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue.

In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64.

This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type.

Reviewers: hfinkel, jdoerfert

Reviewed By: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75471
The file was modifiedllvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp
The file was addedllvm/test/Transforms/AlignmentFromAssumptions/amdgpu-crash.ll
Commit 6dab8067123c208967e8e717496adb76a98f72d3 by aaron.smith
[mlir] Add exp2 conversion to llvm.intr.exp2
The file was modifiedmlir/include/mlir/Dialect/StandardOps/IR/Ops.td
The file was modifiedmlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
The file was modifiedmlir/test/Target/llvmir-intrinsics.mlir
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
The file was modifiedmlir/test/Conversion/StandardToLLVM/convert-to-llvmir.mlir
Commit a7115d51be09ebc8953a269d26bda3d0c50dbab2 by llvm-dev
[X86] X86CallFrameOptimization - generalize slow push code path

Replace the explicit isAtom() || isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case.

This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use.

Differential Revision: https://reviews.llvm.org/D76239
The file was modifiedllvm/lib/Target/X86/X86CallFrameOptimization.cpp
The file was modifiedllvm/test/CodeGen/X86/atomic-idempotent.ll
Commit 6ba63510720f5bd6033a3eeffd51f8d7e0e90432 by nikita.ppv
[PostOrderIterator] Use SmallVector to store stack; NFC

We use a SmallPtrSet to track visited nodes, use a SmallVector
of the same size for the stack.
The file was modifiedllvm/include/llvm/ADT/PostOrderIterator.h
Commit 49d00824bbbb8945b92c0f592c6951a881a6242f by flo
[VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).

This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.

Discussed as part of D74695.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76988
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.h
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.cpp
The file was modifiedllvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
Commit b632bd88a633c84eb2ce8f999119bc4e6c1ee98c by ishizaki
[mlir] NFC: fix trivial typo in documents

Reviewers: mravishankar, antiagainst, nicolasvasilache, herhut, aartbik, mehdi_amini, bondhugula

Reviewed By: mehdi_amini, bondhugula

Subscribers: bondhugula, jdoerfert, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, bader, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76993
The file was modifiedmlir/include/mlir/Interfaces/InferTypeOpInterface.td
The file was modifiedmlir/include/mlir/Dialect/Vector/VectorOps.td
The file was modifiedmlir/include/mlir/Dialect/Quant/QuantOps.td
The file was modifiedmlir/include/mlir/Dialect/GPU/GPUOps.td
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
The file was modifiedmlir/test/mlir-tblgen/llvm-intrinsics.td
The file was modifiedmlir/docs/CreatingADialect.md
The file was modifiedmlir/test/IR/attribute.mlir
The file was modifiedmlir/docs/OpDefinitions.md
The file was modifiedmlir/test/Conversion/StandardToSPIRV/std-types-to-spirv.mlir
The file was modifiedmlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOpsInterface.td
The file was modifiedmlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
The file was modifiedmlir/include/mlir/Dialect/Shape/IR/ShapeOps.td
The file was modifiedmlir/include/mlir/Dialect/SPIRV/SPIRVBase.td
The file was modifiedmlir/docs/Diagnostics.md
The file was modifiedmlir/docs/ConversionToLLVMDialect.md
The file was modifiedmlir/docs/RationaleLinalgDialect.md
Commit 10439f9e32edf0efd34e19f19c0d0e7555cd5492 by llvm-dev
[X86][AVX] Add X86ISD::VALIGN target shuffle decode support

Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-v1.ll
Commit da4c7db793aa71a1e59c31b346e975593c090232 by llvm-dev
[X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC.

This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 7734e4b3a36f233df493e6101086a9c95d309a40 by llvm-dev
[X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720)

As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half.

I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.
The file was modifiedllvm/test/CodeGen/X86/avx-vperm2x128.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-mul.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
Commit 1e363023b823d399a4eac311e846a078cb329ceb by nikita.ppv
[InstCombine] Use replaceOperand() in a few more places

To make sure the old operands get DCEd.

NFC apart from worklist order changes.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
Commit 6f07a9e80ab6f3040ae7d8afeaed7f2a207467d2 by nikita.ppv
[InstCombine] Erase original add when creating saddo

Usually when we replaceInstUsesWith() we also return the original
instruction, and InstCombine will take care of erasing it. Here
we don't do that, so we need to manually erase it.

NFC apart from worklist order changes.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
Commit 28f67bd5c56ba9c466b1fef600923483a967aa97 by nikita.ppv
[InstCombine] Fix worklist management in varargs transform

Add a replaceUse() helper to mirror replaceOperand() for the
rare cases where we're working directly on uses.

NFC apart from worklist order changes.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineInternal.h
Commit 99913ef3d14fcbfc939d9547506b55ac76fd0c59 by flo
[OpenMP] set_bits iterator yields unsigned elements, no reference (NFC).

BitVector::set_bits() returns an iterator range yielding unsinged
elements, which always will be copied while const & gives the impression
that there will be no copy. Newer version of clang complain:

    warning: loop variable 'SetBitsIt' is always a copy because the range of type 'iterator_range<llvm::BitVector::const_set_bits_iterator>' (aka 'iterator_range<const_set_bits_iterator_impl<llvm::BitVector> >') does not return a reference [-Wrange-loop-analysis]

Reviewers: jdoerfert, rnk

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D77010
The file was modifiedllvm/lib/Frontend/OpenMP/OMPContext.cpp
Commit 26fa33755f112194850a4bad62442f1043614213 by nikita.ppv
[InstCombine] Simplify select of cmpxchg transform

Rather than converting to a dummy select with equal true and false
ops, just directly return the resulting value.

As a side-effect, this fixes missing DCE of the previously replaced
operand.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
Commit b44f07045c53870f6b1889bd385f6d2c8d0de396 by llvm-dev
Remove unnecessary empty comments from test check lines. NFC.
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-512-v8.ll
Commit 443dcc0e008bfac4ea3c9ef740074488e660122c by llvm-dev
[X86][AVX] Add tests for 512-bit shuffle patterns that could reduce to subvector extractions
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-512-v8.ll
Commit febcb24f14901ed4b666533a02d099dccd511201 by spatel
[InstCombine] make test independent of branch undef/UB; NFC
The file was modifiedllvm/test/Transforms/InstCombine/pr33689_same_bitwidth.ll
Commit fc3cc8a4b074d42f5352824ccd53de2e592a7af7 by spatel
[VectorCombine] skip debug intrinsics first for efficiency
The file was modifiedllvm/lib/Transforms/Vectorize/VectorCombine.cpp
Commit 97bbe7ad2a961c37d50df24e913cb8766cf2e792 by arsenm2
AMDGPU: Fix typo
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit 0b68ca516239829e6a6d5a79100151eb70a53c9e by arsenm2
AMDGPU: Add some additional tests for v_cvt_ubyte* formation

Use functions now that we have them for less boilerplate in the
output.
The file was modifiedllvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
Commit ab7a41069ebf2f628913face89e6a0ecc0348f5d by arsenm2
AMDGPU: Fix using wrong instruction for FP conversion

This was was never actually hit, but FTRUNC was clearly not the intent
here.
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.h
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit d15723ef06508b6b257279fda31fad0fcec8c686 by arsenm2
AMDGPU/GlobalISel: Remove redundant virtual
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
Commit cce3d96bcc6585e5f134fc093ec50813659c5c5e by arsenm2
GlobalISel: Add matcher for G_SHL
The file was modifiedllvm/unittests/CodeGen/GlobalISel/PatternMatchTest.cpp
The file was modifiedllvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
Commit c0955edfd6ec51e9a3720f9bfc90bac2e511c06d by uday
Introduce support for lib function aligned_alloc in TLI / memory builtins

Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062

This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76970
The file was modifiedllvm/include/llvm/Analysis/MemoryBuiltins.h
The file was modifiedllvm/unittests/Analysis/TargetLibraryInfoTest.cpp
The file was modifiedllvm/include/llvm/Analysis/TargetLibraryInfo.def
The file was modifiedllvm/lib/Analysis/BasicAliasAnalysis.cpp
The file was modifiedllvm/lib/Analysis/MemoryBuiltins.cpp
The file was modifiedllvm/lib/Analysis/TargetLibraryInfo.cpp
The file was modifiedllvm/lib/Transforms/Utils/BuildLibCalls.cpp
The file was modifiedllvm/test/Transforms/DeadStoreElimination/simple.ll