SuccessChanges

Summary

  1. [Lint] Add check for intrinsic get.active.lane.mask (details)
  2. [AMDGPU] Generate test checks for splitkit-copy-bundle.mir (details)
  3. [SplitKit] Only copy live lanes (details)
  4. [NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%) (details)
  5. Revert "Re-land: Add new hidden option -print-changed which only reports changes to IR" (details)
Commit 6637d72ddd3cf4cf3a7e6dfc227a86999137badb by sjoerd.meijer
[Lint] Add check for intrinsic get.active.lane.mask

As @efriedma pointed out in D86301, this "not equal to 0 check" of
get.active.lane.mask's second operand needs to live here in Lint and not the
Verifier.

Differential Revision: https://reviews.llvm.org/D87228
The file was modifiedllvm/lib/Analysis/Lint.cpp (diff)
The file was addedllvm/test/Analysis/Lint/get-active-lane-mask.ll
Commit d49707cf4b288e8d3cad00a78cfa45ec4c376496 by jay.foad
[AMDGPU] Generate test checks for splitkit-copy-bundle.mir

This is a pre-commit for D87757 "[SplitKit] Only copy live lanes".
The file was modifiedllvm/test/CodeGen/AMDGPU/splitkit-copy-bundle.mir (diff)
Commit 6f6d389da5c37e5e9a900902f03dc649d57919b7 by jay.foad
[SplitKit] Only copy live lanes

When splitting a live interval with subranges, only insert copies for
the lanes that are live at the point of the split. This avoids some
unnecessary copies and fixes a problem where copying dead lanes was
generating MIR that failed verification. The test case for this is
test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir.

Without this fix, some earlier live range splitting would create %430:

%430 [256r,848r:0)[848r,2584r:1)  0@256r 1@848r L0000000000000003 [848r,2584r:0)  0@848r L0000000000000030 [256r,2584r:0)  0@256r weight:1.480938e-03
...
256B     undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
848B     %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %430:vreg_128

Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just
before 848B giving:

%432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
%433 [844r,848r:0)[848r,2584r:1)  0@844r 1@848r L0000000000000030 [844r,2584r:0)  0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1)  0@844r 1@848r weight:2.831776e-03
...
256B     undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
844B     undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 {
           internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128
848B     }
  %433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %433:vreg_128

Note that the copy from %432 to %433 at 844B is a curious
bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately,
and it includes a copy of .sub0 which is not live at this point, and
that causes it to fail verification:

*** Bad machine code: No live subrange at use ***
- function:    zextload_global_v64i16_to_v64i64
- basic block: %bb.0  (0x7faed48) [0B;2848B)
- instruction: 844B    undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128
- operand 1:   %432.sub0:vreg_128
- interval:    %432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
- at:          844B

Using real bundles with a BUNDLE instruction might also fix this
problem, but the current fix is less invasive and also avoids some
unnecessary copies.

https://bugs.llvm.org/show_bug.cgi?id=47492

Differential Revision: https://reviews.llvm.org/D87757
The file was modifiedllvm/test/CodeGen/AMDGPU/subreg-split-live-in-error.mir (diff)
The file was modifiedllvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll (diff)
The file was modifiedllvm/lib/CodeGen/SplitKit.cpp (diff)
The file was modifiedllvm/test/CodeGen/AMDGPU/splitkit-copy-bundle.mir (diff)
The file was addedllvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
Commit aadf55d1cea24a4e5384ab8546c3d794cb1ec724 by lebedev.ri
[NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%)

This is functionally equivalent to the old implementation.

As per https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=4739e6e4eb54d3736e6457249c0919b30f6c855a&stat=instructions
this is a clear geomean compile-time regression-free win with overall geomean of `-0.08%`

32 PHI's appears to be the sweet spot; both the 16 and 64 performed worse:
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=c4efe1fbbfdf0305ac26cd19eacb0c7774cdf60e&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=e4989d1c67010d3339d1a40ff5286a31f10cfe82&stat=instructions

If we have more PHI's than that, we fall-back to the original DenseSet-based implementation,
so the not-so-fast cases will still be handled.

However compile-time isn't the main motivation here.
I can name at least 3 limitations of this CSE:
1. Assumes that all PHI nodes have incoming basic blocks in the same order (can be fixed while keeping the DenseMap)
2. Does not special-handle `undef` incoming values (i don't see how we can do this with hashing)
3. Does not special-handle backedge incoming values (maybe can be fixed by hashing backedge as some magical value)

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87408
The file was modifiedllvm/lib/Transforms/Utils/Local.cpp (diff)
Commit b03c2b8395ba94fb53f1e73a6473faedf628bbd9 by douglas.yung
Revert "Re-land: Add new hidden option -print-changed which only reports changes to IR"

The test added in this commit is failing on Windows bots:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/1269

This reverts commit f9e6d1edc0dad9afb26e773aa125ed62c58f7080 and follow-up commit 6859d95ea2d0f3fe0de2923a3f642170e66a1a14.
The file was modifiedllvm/include/llvm/Passes/StandardInstrumentations.h (diff)
The file was modifiedllvm/lib/Passes/StandardInstrumentations.cpp (diff)
The file was modifiedllvm/lib/IR/LegacyPassManager.cpp (diff)
The file was removedllvm/test/Other/change-printer.ll