SuccessChanges

Summary

  1. [X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC. (details)
  2. Create strict aligned code for OpenBSD/arm64. (details)
  3. [X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp (details)
  4. [StackSafety] Change how callee searched in index (details)
  5. [PhaseOrdering] add test for memcpy removal (PR47114); NFC (details)
  6. [InstCombine] add tests for copysign; NFC (details)
  7. [InstCombine] reduce code duplication; NFC (details)
  8. [InstCombine] fold copysign with fabs/fneg operand (details)
  9. Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC" (details)
  10. [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types (details)
  11. [Sema] Validate calls to GetExprRange. (details)
  12. [Sema] Use the proper cast for a fixed bool enum. (details)
  13. [ARM] Tests for tail predicated loads. NFC (details)
  14. [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156 (details)
  15. [OpenMP][CUDA] Cache the maximal number of threads per block (per kernel) (details)
  16. [OpenMP][CUDA] Keep one kernel list per device, not globally. (details)
Commit c27baa54b78478ace01cd81abbdbbf47e3f8c54a by llvm-dev
[X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.

Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.

This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 44613bbec88be9e86b8c52c4f40bb1b1ab48d84c by brad
Create strict aligned code for OpenBSD/arm64.
The file was modifiedclang/test/Driver/arm-alignment.c
The file was modifiedclang/lib/Driver/ToolChains/Arch/AArch64.cpp
Commit dca7eb7d602e7eac667c6d9de5e8f2b0845b9557 by llvm-dev
[X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp

Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.
The file was modifiedllvm/test/CodeGen/X86/haddsub-undef.ll
The file was modifiedllvm/test/CodeGen/X86/phaddsub.ll
The file was modifiedllvm/test/CodeGen/X86/haddsub-shuf.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 47552a614a8c95e1817d83755a4a6a2508da7f8a by Vitaly Buka
[StackSafety] Change how callee searched in index

Handle other than local linkage types.
The file was modifiedllvm/lib/Analysis/StackSafetyAnalysis.cpp
The file was modifiedllvm/test/Analysis/StackSafetyAnalysis/ipa.ll
Commit babb59496b540583c6951813d1e0b3abdea97e7d by spatel
[PhaseOrdering] add test for memcpy removal (PR47114); NFC
The file was addedllvm/test/Transforms/PhaseOrdering/memcpyopt.ll
Commit 4d5fdff43488445746dfea4cf0a5621cfd838c01 by spatel
[InstCombine] add tests for copysign; NFC
The file was modifiedllvm/test/Transforms/InstCombine/copysign.ll
Commit 3fed67b7e6d67b0208c6c8bfc2ed8211d383bc39 by spatel
[InstCombine] reduce code duplication; NFC
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
Commit 3ffb751f3dbf059b2ec061fe2f4302c9eba26b43 by spatel
[InstCombine] fold copysign with fabs/fneg operand

We already get this in the backend, but we need to do
it in IR too to consistently get yet more copysign
transforms.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
The file was modifiedllvm/test/Transforms/InstCombine/copysign.ll
Commit 29e1d16a3eeba98ef8fb2c250301c7e7eb2554f4 by spatel
Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC"

This reverts commit babb59496b540583c6951813d1e0b3abdea97e7d.

This test addition was queued up with some unrelated changes,
but it seems more likely that we need to fix something internal
to -memcpyopt. Also, I'm not sure if including target-specifc
attributes in a generic regression test dir will cause bot
problems.
The file was removedllvm/test/Transforms/PhaseOrdering/memcpyopt.ll
Commit f25d47b7ed3e2e9ddb121471c5d4af76642cd48c by llvm-dev
[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types

We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.

There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.
The file was modifiedllvm/test/CodeGen/X86/haddsub-2.ll
The file was modifiedllvm/test/CodeGen/X86/haddsub-shuf.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/haddsub-undef.ll
Commit 827ba67e383313b05e9b10c8215e501530d6c9e3 by koraq
[Sema] Validate calls to GetExprRange.

When a conditional expression has a throw expression it called
GetExprRange with a void expression, which caused an assertion failure.

This approach was suggested by Richard Smith.

Fixes PR46484: Clang crash in clang/lib/Sema/SemaChecking.cpp:10028

Differential Revision: https://reviews.llvm.org/D85601
The file was modifiedclang/test/SemaCXX/conditional-expr.cpp
The file was modifiedclang/lib/Sema/SemaChecking.cpp
Commit fef26071240711e8f7305715b5f22cfc7ad04bfe by koraq
[Sema] Use the proper cast for a fixed bool enum.

When casting an enumerate with a fixed bool type the casting should use
an IntegralToBoolean instead of an IntegralCast as is required per Core
Issue 2338.

Fixes PR47055: Incorrect codegen for enum with bool underlying type

Differential Revision: https://reviews.llvm.org/D85612
The file was addedclang/test/CodeGen/enum-bool.cpp
The file was modifiedclang/lib/Sema/SemaCast.cpp
The file was modifiedclang/test/CXX/drs/dr23xx.cpp
Commit 5f45f91de41949fbc0124b5615c0c3ec45a3b243 by david.green
[ARM] Tests for tail predicated loads. NFC
The file was addedllvm/test/CodeGen/Thumb2/LowOverheadLoops/unpredload.ll
Commit 95a25e4c3203f35e9f57f9fac620b4a21bffd6e1 by johannes
[OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156

When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037
The file was addedclang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp
The file was modifiedclang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
Commit aa27cfc1e7d7456325e951a4ba3ced405027f7d0 by johannes
[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)

Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038
The file was modifiedopenmp/libomptarget/plugins/cuda/src/rtl.cpp
Commit 5272d29e2cb7c967c3016fa285f14edc7515d9bf by johannes
[OpenMP][CUDA] Keep one kernel list per device, not globally.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86039
The file was modifiedopenmp/libomptarget/plugins/cuda/src/rtl.cpp