SuccessChanges

Summary

  1. [AArch64] [COFF] Properly produce cross-section relative relocations (details)
  2. [ARM] [COFF] Properly produce cross-section relative relocations (details)
  3. [lit] Always quote arguments containing '[' on windows (details)
  4. [PowerPC] Fix incorrect subreg typo from 0148bf53f0a0 (details)
  5. [X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) fold (details)
  6. [X86] Regenerate PR32284.ll test case prefixes. NFC. (details)
  7. [X86] Fold cmpeq/ne(trunc(x),0) --> cmpeq/ne(x,0) (details)
  8. [InstCombine] tmp alloca bypass: ensure that the replacement dominates all alloca uses (details)
  9. [Passes] Enable the relative lookup table converter pass on aarch64 (details)
  10. SDAG: constant fold bf16 -> i16 casts (details)
  11. [lldb][AArch64] Simplify MTE memory region test (details)
  12. [clang] [AArch64] Fix Windows va_arg handling for larger structs (details)
  13. [ValueTracking] add unit test for isKnownNonZero(); NFC (details)
  14. [lit] Remove unnecessary testcases from lit-quoting.txt that fail on macOS (details)
  15. [AIX] Allow safe for 32bit P8 VSX pattern matching (details)
  16. [Test] Account for possibility to free memory in loop load PRE test (details)
  17. [ValueTracking] reduce code duplication; NFC (details)
  18. [AMDGPU] Mark scavenged SGPR as used (details)
  19. [OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT (details)
  20. [OpenCL][Docs] Update OpenCL 3.0 implementation status (details)
  21. [ValueTracking] match negative-stepping non-zero recurrence (details)
  22. [InstSimplify] improve efficiency for detecting non-zero value (details)
  23. CPUDispatch- allow out of line member definitions (details)
  24. [llvm-symbolizer] remove unused variable (details)
  25. [SCCP] Create SCCP Solver (details)
  26. [gn build] Port bbab9f986c6d (details)
  27. [mlir][StandardToSPIRV] Add support for lowering memref<?xi1> to SPIR-V (details)
  28. [AArch64][v8.5A] Add BTI to all function starts (details)
  29. [SLP] createOp - fix null dereference warning. NFCI. (details)
  30. [X86][SSE] canonicalizeShuffleWithBinOps - check for more combos of merge-able binary shuffles. (details)
  31. [AMDGPU] Rename "LDS lowering" pass name. (details)
  32. [Instcombine] Disable memcpy of alloca bypass for instruction sources (details)
  33. [X86] Add PR49028 test case (details)
  34. Add flag for showing skipped headers in -H / --show-includes output (details)
Commit d5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37 by martin
[AArch64] [COFF] Properly produce cross-section relative relocations

This fixes breakage on Windows/ARM64 after D94355.

Modelled after the corresponding code for X86; not entirely familiar
with those aspects of that layer otherwise.

Differential Revision: https://reviews.llvm.org/D99572
The file was modifiedllvm/test/MC/AArch64/coff-relocations-diags.s
The file was modifiedllvm/test/MC/AArch64/coff-relocations.s
The file was modifiedllvm/lib/Target/AArch64/MCTargetDesc/AArch64WinCOFFObjectWriter.cpp
Commit 3b32dc4b84c8eaa0de337d6847c2c4cdbfcb4333 by martin
[ARM] [COFF] Properly produce cross-section relative relocations

Differential Revision: https://reviews.llvm.org/D99574
The file was modifiedllvm/lib/Target/ARM/MCTargetDesc/ARMWinCOFFObjectWriter.cpp
The file was modifiedllvm/test/MC/ARM/coff-relocations.s
Commit 37935405efbebc4bd9f1ffac9152571c6a8469dc by martin
[lit] Always quote arguments containing '[' on windows

This avoids breaking clang-tidy/infrastructure/validate-check-names.cpp
if 'not' is evaluated as a lit internal tool (making TestRunner
invoke 'grep' directly in that test, instead of invoking 'not', which
then invokes 'grep').

The quoting of arguments is still brittle if the executable is an
MSYS based tool though, as MSYS based tools incorrectly unescape
backslashes in quoted arguments (contrary to regular win32 argument
parsing rules), see D99406 and
https://github.com/msys2/msys2-runtime/issues/36 for more examples
of the issues.

Differential Revision: https://reviews.llvm.org/D99938
The file was modifiedllvm/test/Other/lit-quoting.txt
The file was modifiedllvm/utils/lit/lit/TestRunner.py
Commit 8be3181df6f13544d97c1e263a91aa376a760c99 by nemanja.i.ibm
[PowerPC] Fix incorrect subreg typo from 0148bf53f0a0
The file was modifiedllvm/lib/Target/PowerPC/PPCInstrVSX.td
Commit 016ceb838231a717e889f7ceb38c56575e82aead by llvm-dev
[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) fold

Extension to rG74f98391a7a4, we can also include any of the upper (known zero) bits in the comparison in the shuffle removal fold, just as long as we demand all the elements of the movmsk source vector.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 62af2af85daf79471c15a23f1b4f81a83a8bdd19 by llvm-dev
[X86] Regenerate PR32284.ll test case prefixes. NFC.

Use X64 for 64-bit targets and X86 for 32-bit targets
The file was modifiedllvm/test/CodeGen/X86/pr32284.ll
Commit 73737fe9900dae6a7e766043477d646b43d7f284 by llvm-dev
[X86] Fold cmpeq/ne(trunc(x),0) --> cmpeq/ne(x,0)

Relax the fold from rGbaadbe04bf75 to compare any op, not just logic ops, now that the movmsk regressions have been handled.
The file was modifiedllvm/test/CodeGen/X86/vector-compare-any_of.ll
The file was modifiedllvm/test/CodeGen/X86/setcc-lowering.ll
The file was modifiedllvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll
The file was modifiedllvm/test/CodeGen/X86/pr32284.ll
The file was modifiedllvm/test/CodeGen/X86/movmsk-cmp.ll
The file was modifiedllvm/test/CodeGen/X86/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-or-bool.ll
Commit 2fea5d5d4accf3490854b064a51d1db049b1de64 by lebedev.ri
[InstCombine] tmp alloca bypass: ensure that the replacement dominates all alloca uses

After 077bff39d46364035a5dcfa32fc69910ad0975d0,
isDereferenceableForAllocaSize() can recurse into selects,
which is causing a problem for the new test case,
reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html
because the replacement (the select) is defined after the first use
of an alloca, so we'd end up with a verifier error.

Now, this new check is too restrictive.
We likely can handle *some* cases, by trying to sink all uses of an alloca
to after the the def.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
The file was addedllvm/test/Transforms/InstCombine/tmp-alloca-bypass.ll
The file was modifiedllvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
Commit 57b259a852a6383880f5d0875d848420bb3c2945 by martin
[Passes] Enable the relative lookup table converter pass on aarch64

After d5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37, it should work fine
for aarch64 on COFF too. (It was disabled when the patch was
(re)applied in e96df3e531f506eea75da0f13d0f8aa9a267f975, pending
that fix.)
The file was modifiedllvm/include/llvm/CodeGen/BasicTTIImpl.h
Commit 6401b78ab3cf18cb5f0821f9bd52063af0d7ce35 by Tim Northover
SDAG: constant fold bf16 -> i16 casts

This direction is particularly useful because i16 constants are much more
likely to be legal than bf16.
The file was modifiedllvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
The file was modifiedllvm/test/CodeGen/AArch64/bf16.ll
Commit 6cdc2239dbabeb6fb8a9f933693f744a60d50a8c by david.spickett
[lldb][AArch64] Simplify MTE memory region test

By checking for cpu and toolchain features ahead
of time we don't need the custom return codes.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D97684
The file was modifiedlldb/packages/Python/lldbsuite/test/decorators.py
The file was modifiedlldb/test/API/linux/aarch64/mte_memory_region/main.c
The file was modifiedlldb/test/API/linux/aarch64/mte_memory_region/TestAArch64LinuxMTEMemoryRegion.py
Commit 3637c5c8ec3d4dc0b87eb4e3ee9c9ae8816cade2 by martin
[clang] [AArch64] Fix Windows va_arg handling for larger structs

Aggregate types over 16 bytes are passed by reference.

Contrary to the x86_64 ABI, smaller structs with an odd (non power
of two) are padded and passed in registers.

Differential Revision: https://reviews.llvm.org/D100374
The file was modifiedclang/lib/CodeGen/TargetInfo.cpp
The file was modifiedclang/test/CodeGen/ms_abi_aarch64.c
Commit 989445f4386cdc1fce20eb4e418ed4b502819cc7 by spatel
[ValueTracking] add unit test for isKnownNonZero(); NFC

We call various value tracking APIs from within -instsimplify,
so I don't think this is visible in a larger test.
The file was modifiedllvm/unittests/Analysis/ValueTrackingTest.cpp
Commit 413d84fb5c6d18efc0c3f478071a11c7c3542fd0 by martin
[lit] Remove unnecessary testcases from lit-quoting.txt that fail on macOS

These were added in 37935405efbebc4bd9f1ffac9152571c6a8469dc,
but they fail on macOS (and on Windows with MSYS based tools, before
relanding D98859). Remove the tests that exercise "not not echo", as
the primary thing to test is the plain echo patterns above.
The file was modifiedllvm/test/Other/lit-quoting.txt
Commit 6b7838b68cc49621f3c92d8603f95e801d10f759 by zarko
[AIX] Allow safe for 32bit P8 VSX pattern matching

Pull some of the safe for 32bit pattern matching for Pwr8 and above.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D97909
The file was modifiedllvm/test/CodeGen/PowerPC/cannonicalize-vector-shifts.ll
The file was addedllvm/test/CodeGen/PowerPC/aix32-p8-scalar_vector_conversions.ll
The file was modifiedllvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
The file was modifiedllvm/lib/Target/PowerPC/PPCInstrVSX.td
Commit d0920b201f7cb7494dff9334725e123283128c95 by mkazantsev
[Test] Account for possibility to free memory in loop load PRE test
The file was modifiedllvm/test/Transforms/GVN/PRE/pre-loop-load.ll
Commit 49193653974ae96b756b8ff13668d07d6252aa77 by spatel
[ValueTracking] reduce code duplication; NFC

The start value can't be null for something to be a non-zero
recurrence, so hoist that common check out of the switch.

Subsequent checks may be incomplete or over-specified as noted in:
D100408
The file was modifiedllvm/lib/Analysis/ValueTracking.cpp
Commit 929edd4375a40fcf264426ac4f2b3d8fa9c72970 by sebastian.neubauer
[AMDGPU] Mark scavenged SGPR as used

Otherwise it reuses the same register for storing the stack slot
offset if the stack slot offset is big.

Differential Revision: https://reviews.llvm.org/D100461
The file was modifiedllvm/test/CodeGen/AMDGPU/sgpr-spill.mir
The file was modifiedllvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll
Commit 77dc7b465313345ab0a5929f6f0386dbcab6594c by hansang.bae
[OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT

Also fixed typo in the verbose message.

Differential Revision: https://reviews.llvm.org/D100414
The file was modifiedopenmp/runtime/src/kmp_settings.cpp
The file was modifiedopenmp/runtime/test/ompt/loadtool/tool_available/tool_available.c
The file was modifiedopenmp/runtime/src/ompt-general.cpp
Commit 856c49d79c0d717fb3e9ff6deebfe740a4f752e2 by sven.vanhaastregt
[OpenCL][Docs] Update OpenCL 3.0 implementation status

Reviewed-By: Anastasia Stulova
The file was modifiedclang/docs/OpenCLSupport.rst
Commit 5ae5d25e38efad1d59ed97d969a5e930b58a5e16 by spatel
[ValueTracking] match negative-stepping non-zero recurrence

This is pulled out of D100408.

This avoids a regression that would be exposed by making the
calling code from InstSimplify more efficient.
The file was modifiedllvm/unittests/Analysis/ValueTrackingTest.cpp
The file was modifiedllvm/lib/Analysis/ValueTracking.cpp
Commit 7ef2c68a3d24af0b0d540e748e8b564180f4e18a by spatel
[InstSimplify] improve efficiency for detecting non-zero value

Stepping through callstacks in the example from D99759 reveals
this potential compile-time improvement.

The savings come from avoiding ValueTracking's computing known
bits if we have already dealt with special-case patterns.

Further improvements in this direction seem possible.

This makes a degenerate test based on PR49785 about 40x faster
(25 sec -> 0.6 sec), but it does not address the larger question
of how to limit computeKnownBitsFromAssume(). Ie, the original
test there is still infinite-time for all practical purposes.

Differential Revision: https://reviews.llvm.org/D100408
The file was modifiedllvm/lib/Analysis/InstructionSimplify.cpp
Commit 92aba5ae49a6970c43bead0afd1e52c83fe44e6e by erich.keane
CPUDispatch- allow out of line member definitions

ICC permits this, and after some extensive testing it looks like we can
support this with very little trouble.  We intentionally don't choose to
do this with attribute-target (despite it likely working as well!)
  because GCC does not support that, and introducing said
  incompatibility doesn't seem worth it.
The file was addedclang/test/CodeGenCXX/attr-cpuspecific-outoflinedefs.cpp
The file was modifiedclang/lib/Sema/SemaDecl.cpp
The file was modifiedclang/test/SemaCXX/attr-cpuspecific.cpp
Commit 7a9cb801f3e71e8acca1598910c6dd19526942d8 by thakis
[llvm-symbolizer] remove unused variable

This should've been removed in D83530.

Differential Revision: https://reviews.llvm.org/D100434
The file was modifiedllvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
Commit bbab9f986c6df8508eb64697923eb70ee17cb0f8 by sjoerd.meijer
[SCCP] Create SCCP Solver

This refactors SCCP and creates a SCCPSolver interface and class so that it can
be used by other passes and transformations. We will use this in D93838, which
adds a function specialisation pass.

This is based on an early version by Vinay Madhusudan.

Differential Revision: https://reviews.llvm.org/D93762
The file was modifiedllvm/lib/Transforms/Utils/CMakeLists.txt
The file was modifiedllvm/lib/Transforms/Scalar/SCCP.cpp
The file was modifiedllvm/include/llvm/Transforms/Scalar/SCCP.h
The file was addedllvm/include/llvm/Transforms/Utils/SCCPSolver.h
The file was addedllvm/lib/Transforms/Utils/SCCPSolver.cpp
Commit 34367dd2535c576d0fecbb803b38ada9918dc5e7 by llvmgnsyncbot
[gn build] Port bbab9f986c6d
The file was modifiedllvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
Commit 7c4de2e9b9b469b073e6f5f044977b23ac1b26c6 by hanchung
[mlir][StandardToSPIRV] Add support for lowering memref<?xi1> to SPIR-V

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D100452
The file was modifiedmlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp
The file was modifiedmlir/test/Conversion/StandardToSPIRV/std-types-to-spirv.mlir
Commit cca40aa8d8aa732a226c8978e53cd47e7b7c76ec by pablo.barrio
[AArch64][v8.5A] Add BTI to all function starts

The existing BTI placement pass avoids inserting "BTI c" when the
function has local linkage and is only directly called. However,
even in this case, there is a (small) chance that the linker later
adds a hunk with an indirect call to the function, e.g. if the
function is placed in a separate section and moved far away from
its callers. Make sure to add BTI for these functions too.

Differential Revision: https://reviews.llvm.org/D99417
The file was modifiedllvm/test/CodeGen/AArch64/branch-target-enforcement.mir
The file was modifiedllvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64BranchTargets.cpp
Commit b49c41afbaa212cc15343af68c3293ab929a2d34 by llvm-dev
[SLP] createOp - fix null dereference warning. NFCI.

Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit 4fbe7615721863c57b4fd4334f361a5d4157e235 by llvm-dev
[X86][SSE] canonicalizeShuffleWithBinOps - check for more combos of merge-able binary shuffles.

In the fold SHUFFLE(BINOP(X,Y),BINOP(Z,W)) -> BINOP(SHUFFLE(X,Z),SHUFFLE(Y,W)), check if both X/Z AND Y/W have at least one merge-able shuffle in which case the total number of shuffle should still fall.

Helps with instruction count regressions we saw while fixing PR48823
The file was modifiedllvm/test/CodeGen/X86/haddsub-3.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit e3070db0f7049fdbd75955b3e68a3d2bc4936e48 by mahesha.comp
[AMDGPU] Rename "LDS lowering" pass name.

Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to
`amdgpu-enable-lower-module-lds` as later is consistent and reads better.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D100441
The file was modifiedllvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-constantexpr-use.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/addrspacecast-initializer-unsupported.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/lds-global-non-entry-func.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll
Commit cf4161673c7e7c7c57d8115468bfcc9988f43d36 by benny.kra
[Instcombine] Disable memcpy of alloca bypass for instruction sources

This transformation is fundamentally broken when it comes to dominance,
it just happened to work when the source of the memcpy can be moved into
the place of the alloca. The bug shows up a lot more often since
077bff39d46364035a5dcfa32fc69910ad0975d0 allows the source to be a
switch.

It would be possible to check dominance of the source and all its
operands, but that seems very heavy for instcombine.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
The file was modifiedllvm/test/Transforms/InstCombine/tmp-alloca-bypass.ll
Commit c4c9e4d6df3c492cf86728288b14a9bc718f6e2d by llvm-dev
[X86] Add PR49028 test case
The file was addedllvm/test/CodeGen/X86/pr49028.ll
Commit f29dcbdde10c86cfd89196fc2aa0e7f6ca3c9c4e by hans
Add flag for showing skipped headers in -H / --show-includes output

Consider the following set of files:

  a.cc:
  #include "a.h"

  a.h:
  #ifndef A_H
  #define A_H

  #include "b.h"
  #include "c.h"  // This gets "skipped".

  #endif

  b.h:
  #ifndef B_H
  #define B_H

  #include "c.h"

  #endif

  c.h:
  #ifndef C_H
  #define C_H

  void c();

  #endif

And the output of the -H option:

  $ clang -c -H a.cc
  . ./a.h
  .. ./b.h
  ... ./c.h

Note that the include of c.h in a.h is not shown in the output (GCC does the
same). This is because of the include guard optimization: clang knows c.h is
covered by an include guard which is already defined, so when it sees the
include in a.h, it skips it. The same would have happened if #pragma once were
used instead of include guards.

However, a.h *does* include c.h, and it may be useful to show that in the -H
output. This patch adds a flag for doing that.

Differential revision: https://reviews.llvm.org/D100480
The file was modifiedclang/include/clang/Driver/Options.td
The file was modifiedclang/lib/Frontend/HeaderIncludeGen.cpp
The file was modifiedclang/test/Frontend/Inputs/test.h
The file was modifiedclang/test/Frontend/print-header-includes.c
The file was modifiedclang/test/Frontend/Inputs/test2.h
The file was modifiedclang/include/clang/Frontend/DependencyOutputOptions.h