SuccessChanges

Summary

  1. [llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata (details)
  2. [flang][fir][NFC] remove dead code (details)
  3. [mlir][sparse] incorporate vector index into address computation (details)
  4. Defer the decision whether to use the CU or TU index until after reading the unit header. (details)
  5. [Driver][Windows] Support per-target runtimes dir layout for profile instr generate (details)
  6. [SEMA] Added warn_decl_shadow support for structured bindings (details)
  7. AMDGPU: Use aligned vgprs/agprs in gfx90a mir tests (details)
  8. [ARM] Mir test for pre/postinc ldstopt combines. NFC (details)
  9. [mlir] Refactor InterfaceMap to use a sorted vector of interfaces, as opposed to a DenseMap (details)
  10. [mlir][Inliner] Use llvm::parallelForEach instead of llvm::parallelTransformReduce (details)
  11. [WebAssembly] Disable wasm.lsda() optimization in WasmEHPrepare (details)
  12. Fix a range-loop-analysis warning. (details)
  13. [scan-build-py] Add sarif-html support in scan-build-py (details)
  14. [WebAssembly] Fix incorrect grouping and sorting of exceptions (details)
  15. [LTO] Fix test failures caused by 6da7d3141651 (details)
Commit 6da7d3141651ed3ef2b5f369e8ca0eb2e5c66778 by matthew.voss
[llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata

Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.

Differential Revision: https://reviews.llvm.org/D92074
The file was modifiedllvm/include/llvm/ProfileData/InstrProf.h
The file was modifiedllvm/include/llvm/ProfileData/InstrProfWriter.h
The file was modifiedllvm/lib/ProfileData/InstrProfWriter.cpp
The file was modifiedllvm/lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp
The file was addedllvm/test/Transforms/PGOProfile/consecutive-zeros.ll
The file was modifiedllvm/tools/llvm-profdata/llvm-profdata.cpp
The file was modifiedllvm/lib/ProfileData/InstrProf.cpp
The file was modifiedllvm/test/Transforms/PGOProfile/memop_size_opt.ll
The file was addedllvm/test/tools/llvm-profdata/invalid-profile-gen-zeros.proftext
The file was addedllvm/test/Transforms/PGOProfile/Inputs/consecutive-zeros.proftext
Commit 6740694742165c03e074f19141f58a8df5a887ec by eschweitz
[flang][fir][NFC] remove dead code

Removes unused function from FatalError.h.

Differential revision: https://reviews.llvm.org/D97328
The file was modifiedflang/include/flang/Optimizer/Support/FatalError.h
Commit 17fa9198471eb559aa772df92484516aee1dbf87 by ajcbik
[mlir][sparse] incorporate vector index into address computation

When computing dense address, a vectorized index must be accounted
for properly. This bug was formerly undetected because we get 0 * prev + i
in most cases, which folds away the scalar part. Now it works for all cases.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D97317
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
The file was modifiedmlir/test/Dialect/Linalg/sparse_vector.mlir
Commit 979ca1c05f83114483caec3e6d1b75daae86da79 by jgorbe
Defer the decision whether to use the CU or TU index until after reading the unit header.

In DWARF v4 compile units go in .debug_info and type units go in
.debug_types. However, in v5 both kinds of units are in .debug_info.
Therefore we can't decide whether to use the CU or TU index just by
looking at which section we're reading from. We have to wait until we
have read the unit type from the header.

Differential Revision: https://reviews.llvm.org/D96194
The file was modifiedlldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
The file was modifiedlldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
The file was modifiedlldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
The file was addedlldb/test/Shell/SymbolFile/DWARF/dwarf5_tu_index_abbrev_offset.s
Commit 7f9d5d6e444c91ce6f2e377b312ac573dfc6779a by markus.boeck02
[Driver][Windows] Support per-target runtimes dir layout for profile instr generate

When targeting a MSVC triple, --dependant-libs with the name of the clang runtime library for profiling is added to the command line args. In it's current implementations clang_rt.profile-<ARCH> is chosen as the name. When building a distribution using LLVM_ENABLE_PER_TARGET_RUNTIME_DIR this fails, due to the runtime file names not having an architecture suffix in the filename.

This patch refactors getCompilerRT and getCompilerRTBasename to always consider per-target runtime directories. getCompilerRTBasename now simply returns the filename component of the path found by getCompilerRT

Differential Revision: https://reviews.llvm.org/D96638
The file was modifiedclang/lib/Driver/ToolChains/BareMetal.h
The file was modifiedclang/test/Driver/instrprof-ld.c
The file was modifiedclang/lib/Driver/ToolChain.cpp
The file was modifiedclang/test/Driver/cl-options.c
The file was modifiedclang/lib/Driver/ToolChains/BareMetal.cpp
The file was modifiedclang/test/Driver/sanitizer-ld.c
The file was modifiedclang/include/clang/Driver/ToolChain.h
The file was modifiedclang/test/Driver/fsanitize.c
Commit 039f79c78cfa2c0d0d61de117ff46aa43cb5e831 by richard
[SEMA] Added warn_decl_shadow support for structured bindings

https://bugs.llvm.org/show_bug.cgi?id=40858

CheckShadow is now called for each binding in the structured binding to make sure it does not shadow any other variable in scope. This does use a custom implementation of getShadowedDeclaration though because a BindingDecl is not a VarDecl

Added a few unit tests for this. In theory though all the other shadow unit tests should be duplicated for the structured binding variables too but whether it is probably not worth it as they use common code. The MyTuple and std interface code has been copied from live-bindings-test.cpp

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D96147
The file was modifiedclang/lib/Sema/SemaDeclCXX.cpp
The file was modifiedclang/test/SemaCXX/warn-shadow.cpp
The file was modifiedclang/include/clang/Basic/DiagnosticSemaKinds.td
The file was modifiedclang/lib/Sema/SemaDecl.cpp
The file was modifiedclang/docs/ReleaseNotes.rst
The file was modifiedclang/include/clang/Sema/Sema.h
Commit e844f24a278bf4b3622fb969d06e7b771a6a2aca by Matthew.Arsenault
AMDGPU: Use aligned vgprs/agprs in gfx90a mir tests

These would fail a verifier check in a future change.
The file was modifiedllvm/test/CodeGen/AMDGPU/waitcnt-agpr.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/mai-hazards-gfx90a.mir
Commit 8fa2bbaed9252b217105ea332be8a0a85492099b by david.green
[ARM] Mir test for pre/postinc ldstopt combines. NFC
The file was addedllvm/test/CodeGen/Thumb2/store-prepostinc.mir
The file was addedllvm/test/CodeGen/ARM/store-prepostinc.mir
Commit 65a3197a8fa2e5d1deb8707bda13ebd21e1dedb3 by riddleriver
[mlir] Refactor InterfaceMap to use a sorted vector of interfaces, as opposed to a DenseMap

A majority of operations have a very small number of interfaces, which means that the cost of using a hash map is generally larger for interface lookups than just a binary search. In the future when there are a number of operations with large amounts of interfaces, we can switch to a hybrid approach that optimizes lookups based on the number of interfaces. For now, however, a binary search is the best approach.

This dropped compile time on a largish TF MLIR module by 20%(half a second).

Differential Revision: https://reviews.llvm.org/D96085
The file was modifiedmlir/lib/IR/Operation.cpp
The file was modifiedmlir/include/mlir/Support/InterfaceSupport.h
The file was modifiedmlir/include/mlir/IR/OperationSupport.h
Commit abd3c6f24c823be6fb316b501482d8637c4a0724 by riddleriver
[mlir][Inliner] Use llvm::parallelForEach instead of llvm::parallelTransformReduce

llvm::parallelTransformReduce does not schedule work on the caller thread, which becomes very costly for
the inliner where a majority of SCCs are small, often ~1 element. The switch to llvm::parallelForEach solves this,
and also aligns the implementation with the PassManager (which realistically should share the same implementation).

This change dropped compile time on an internal benchmark by ~1(25%) second.

Differential Revision: https://reviews.llvm.org/D96086
The file was modifiedmlir/lib/Transforms/Inliner.cpp
Commit 445f4e74841e87da06743a4c126b09c9b9b05124 by aheejin
[WebAssembly] Disable wasm.lsda() optimization in WasmEHPrepare

In every catchpad except `catch (...)`, we add a call to
`_Unwind_CallPersonality`, which is a wapper to call the personality
function. (In most of other Itanium-based architectures the call is done
from libunwind, but in wasm we don't have the control over the VM.)
Because the personatlity function is called to figure out whether the
current exception is a type we should catch, such as `int` or
`SomeClass&`, `catch (...)` does not need the personality function call.
For the same reason, all cleanuppads don't need it.

When we call `_Unwind_CallPersonality`, we store some necessary info in
a data structure called `__wasm_lpad_context` of type
`_Unwind_LandingPadContext`, which is defined  in the wasm's port of
libunwind in Emscripten. Also the personality wrapper function returns
some info (selector and the caught pointer) in that data structure, so
it is used as a medium for communication.

One of the info we need to store is the address for LSDA info for the
current function. `wasm.lsda()` intrinsic returns that address. (This
intrinsic will be lowered to a symbol that points to the LSDA address.)
The simpliest thing is call `wasm.lsda()` every time we need to call
`_Unwind_CallPersonality` and store that info in `__wasm_lpad_context`
data structure. But we tried to be better than that (D77423 and some
more previous CLs), so if catchpad A dominates catchpad B and catchpad A
is not `catch (...)`, we didn't insert `wasm.lsda()` call in catchpad B,
thinking that the LSDA address is the same for a single function and we
already visited catchpad A and `__wasm_lpad_context.lsda` field would
already have that value.

But this can be incorrect if there is a call to another function, which
also can have the personality function and LSDA, between catchpad A and
catchpad B, because `__wasm_lpad_context` is a globally defined
structure and the callee function will overwrite its `lsda` field.

So in this CL we don't try to do any optimizaions on adding
`wasm.lsda()` call; we store the result of `wasm.lsda()` every time we
call `_Unwind_CallPersonality`. We can do some complicated analysis,
like checking if there is a function call between the dominating
catchpad and the current catchpad, but at this time it seems overkill.

This deletes three tests because they all tested `wasm.ldsa()` call
optimization.

Fixes https://github.com/emscripten-core/emscripten/issues/13548.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D97309
The file was modifiedllvm/lib/CodeGen/WasmEHPrepare.cpp
The file was modifiedllvm/test/CodeGen/WebAssembly/wasmehprepare.ll
Commit 4691405ba983d7b1efc3e72d17d09c1a497afe90 by Amara Emerson
Fix a range-loop-analysis warning.
The file was modifiedllvm/lib/CodeGen/RDFLiveness.cpp
Commit 97a304cc8f949e40693d63b855b4b24bc81fa729 by mvanotti
[scan-build-py] Add sarif-html support in scan-build-py

Update scan-build-py to be able to trigger sarif-html output format in clang static analyzer.

NOTE: testcase `test_sarif_and_html_creates_sarif_and_html_reports` will fail if the default clang does not have change https://reviews.llvm.org/D96389 . This can be remediated by pointing the default clang in arguments.py to a locally built clang. I was unable to figure out where these particular tests for scan-build-py are being invoked (aside from manually), so any help there would be greatly appreciated.

Reviewed By: aabbaabb, xazax.hun

Differential Revision: https://reviews.llvm.org/D96570
The file was modifiedclang/tools/scan-build-py/libscanbuild/report.py
The file was modifiedclang/tools/scan-build-py/libscanbuild/analyze.py
The file was modifiedclang/tools/scan-build-py/libscanbuild/arguments.py
The file was modifiedclang/tools/scan-build-py/tests/functional/cases/test_from_cdb.py
Commit ea8c6375e3330f181105106b3adb84ff9fa76a7c by aheejin
[WebAssembly] Fix incorrect grouping and sorting of exceptions

This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.

---

`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.

---

`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.

This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
  try
  catch
    ...   <- %dest_bb is among here!
  end
delegate %dest_bb
```

So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.

But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.

---

CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.

`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.

Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.

One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.

So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)

When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.

---

This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97247
The file was modifiedllvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyCFGSort.cpp
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
The file was modifiedllvm/include/llvm/CodeGen/WasmEHFuncInfo.h
The file was modifiedllvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
The file was modifiedllvm/unittests/Target/WebAssembly/WebAssemblyExceptionInfoTest.cpp
Commit 1d7f1d15c517db2bde30228dfdcb17f6471f7916 by matthew.voss
[LTO] Fix test failures caused by 6da7d3141651

Adds "REQUIRES: asserts", since the test uses debug messages
The file was modifiedllvm/test/Transforms/PGOProfile/consecutive-zeros.ll