SuccessChanges

Summary

  1. [SelectionDAG] Check any use of negation result before removal (details)
  2. [Lint] Add check for intrinsic get.active.lane.mask (details)
  3. [AMDGPU] Generate test checks for splitkit-copy-bundle.mir (details)
  4. [SplitKit] Only copy live lanes (details)
  5. [NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%) (details)
  6. Revert "Re-land: Add new hidden option -print-changed which only reports changes to IR" (details)
  7. [X86] Fix stack alignment on 32-bit Solaris/x86 (details)
Commit a2fb5446be960ad164060b3c05fc268f7f72d67a by qiucofan
[SelectionDAG] Check any use of negation result before removal

2508ef01 fixed a bug about constant removal in negation. But after
sanitizing check I found there's still some issue about it so it's
reverted.

Temporary nodes will be removed if useless in negation. Before the
removal, they'd be checked if any other nodes used it. So the removal
was moved after getNode. However in rare cases the node to be removed is
the same as result of getNode. We missed that and will be fixed by this
patch.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
The file was addedllvm/test/CodeGen/X86/pr47517.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Commit 6637d72ddd3cf4cf3a7e6dfc227a86999137badb by sjoerd.meijer
[Lint] Add check for intrinsic get.active.lane.mask

As @efriedma pointed out in D86301, this "not equal to 0 check" of
get.active.lane.mask's second operand needs to live here in Lint and not the
Verifier.

Differential Revision: https://reviews.llvm.org/D87228
The file was modifiedllvm/lib/Analysis/Lint.cpp
The file was addedllvm/test/Analysis/Lint/get-active-lane-mask.ll
Commit d49707cf4b288e8d3cad00a78cfa45ec4c376496 by jay.foad
[AMDGPU] Generate test checks for splitkit-copy-bundle.mir

This is a pre-commit for D87757 "[SplitKit] Only copy live lanes".
The file was modifiedllvm/test/CodeGen/AMDGPU/splitkit-copy-bundle.mir
Commit 6f6d389da5c37e5e9a900902f03dc649d57919b7 by jay.foad
[SplitKit] Only copy live lanes

When splitting a live interval with subranges, only insert copies for
the lanes that are live at the point of the split. This avoids some
unnecessary copies and fixes a problem where copying dead lanes was
generating MIR that failed verification. The test case for this is
test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir.

Without this fix, some earlier live range splitting would create %430:

%430 [256r,848r:0)[848r,2584r:1)  0@256r 1@848r L0000000000000003 [848r,2584r:0)  0@848r L0000000000000030 [256r,2584r:0)  0@256r weight:1.480938e-03
...
256B     undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
848B     %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %430:vreg_128

Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just
before 848B giving:

%432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
%433 [844r,848r:0)[848r,2584r:1)  0@844r 1@848r L0000000000000030 [844r,2584r:0)  0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1)  0@844r 1@848r weight:2.831776e-03
...
256B     undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
844B     undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 {
           internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128
848B     }
  %433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %433:vreg_128

Note that the copy from %432 to %433 at 844B is a curious
bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately,
and it includes a copy of .sub0 which is not live at this point, and
that causes it to fail verification:

*** Bad machine code: No live subrange at use ***
- function:    zextload_global_v64i16_to_v64i64
- basic block: %bb.0  (0x7faed48) [0B;2848B)
- instruction: 844B    undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128
- operand 1:   %432.sub0:vreg_128
- interval:    %432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
- at:          844B

Using real bundles with a BUNDLE instruction might also fix this
problem, but the current fix is less invasive and also avoids some
unnecessary copies.

https://bugs.llvm.org/show_bug.cgi?id=47492

Differential Revision: https://reviews.llvm.org/D87757
The file was addedllvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
The file was modifiedllvm/lib/CodeGen/SplitKit.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/subreg-split-live-in-error.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/splitkit-copy-bundle.mir
Commit aadf55d1cea24a4e5384ab8546c3d794cb1ec724 by lebedev.ri
[NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%)

This is functionally equivalent to the old implementation.

As per https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=4739e6e4eb54d3736e6457249c0919b30f6c855a&stat=instructions
this is a clear geomean compile-time regression-free win with overall geomean of `-0.08%`

32 PHI's appears to be the sweet spot; both the 16 and 64 performed worse:
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=c4efe1fbbfdf0305ac26cd19eacb0c7774cdf60e&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=e4989d1c67010d3339d1a40ff5286a31f10cfe82&stat=instructions

If we have more PHI's than that, we fall-back to the original DenseSet-based implementation,
so the not-so-fast cases will still be handled.

However compile-time isn't the main motivation here.
I can name at least 3 limitations of this CSE:
1. Assumes that all PHI nodes have incoming basic blocks in the same order (can be fixed while keeping the DenseMap)
2. Does not special-handle `undef` incoming values (i don't see how we can do this with hashing)
3. Does not special-handle backedge incoming values (maybe can be fixed by hashing backedge as some magical value)

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87408
The file was modifiedllvm/lib/Transforms/Utils/Local.cpp
Commit b03c2b8395ba94fb53f1e73a6473faedf628bbd9 by douglas.yung
Revert "Re-land: Add new hidden option -print-changed which only reports changes to IR"

The test added in this commit is failing on Windows bots:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/1269

This reverts commit f9e6d1edc0dad9afb26e773aa125ed62c58f7080 and follow-up commit 6859d95ea2d0f3fe0de2923a3f642170e66a1a14.
The file was modifiedllvm/lib/Passes/StandardInstrumentations.cpp
The file was modifiedllvm/include/llvm/Passes/StandardInstrumentations.h
The file was modifiedllvm/lib/IR/LegacyPassManager.cpp
The file was removedllvm/test/Other/change-printer.ll
Commit a9cbe5cf30e386a4f44981f5bf9e1862ad36574d by ro
[X86] Fix stack alignment on 32-bit Solaris/x86

On Solaris/x86, several hundred 32-bit tests `FAIL`, all in the same way:

  env ASAN_OPTIONS=halt_on_error=false ./halt_on_error_suppress_equal_pcs.cpp.tmp
  Segmentation Fault (core dumped)

They segfault during startup:

  Thread 2 received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 1 (LWP 1)]
  0x080f21f0 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) () at /vol/llvm/src/llvm-project/dist/compiler-rt/lib/sanitizer_common/sanitizer_solaris.cpp:65
  65                              int prot, int flags, int fd, OFF_T offset) {
  1: x/i $pc
  => 0x80f21f0 <_ZN11__sanitizer13internal_mmapEPvmiiiy+16>: movaps 0x30(%esp),%xmm0
  (gdb) p/x $esp
  $3 = 0xfeffd488

The problem is that `movaps` expects 16-byte alignment, while 32-bit Solaris/x86
only guarantees 4-byte alignment following the i386 psABI.

This patch updates `X86Subtarget::initSubtargetFeatures` accordingly,
handles Solaris/x86 in the corresponding testcase, and allows for some
variation in address alignment in
`compiler-rt/test/ubsan/TestCases/TypeCheck/vptr.cpp`.

Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D87615
The file was modifiedllvm/lib/Target/X86/X86Subtarget.cpp
The file was modifiedcompiler-rt/test/ubsan/TestCases/TypeCheck/vptr.cpp
The file was modifiedllvm/test/CodeGen/X86/stack-align2.ll