1. [ARM] Remove LLC tests from transform/hardware loop tests. (details)
  2. [ARM] Add low overhead loops terminators to AnalyzeBranch (details)
  3. [InstSimplify] Handle commutativity for 'and' and 'outer or' for (~A & B) | ~(A | B) --> ~A (details)
  4. [SLP] remove unnecessary use of 'OperationData' (details)
  5. [SLP] fix typos; NFC (details)
  6. [SLP] remove opcode field from reduction data class (details)
  7. [OpenMP] Added the support for hidden helper task in RTL (details)
  8. [mlir][sparse] improved sparse runtime support library (details)
  9. [NFC] Removed extra text in comments (details)
Commit c1ab698dce8dd4e751e63142ebb333d5b90bb8dc by
[ARM] Remove LLC tests from transform/hardware loop tests.

We now have a lot of llc tests for hardware loops in CodeGen, which test
a larger variety of loops and are easier to maintain. This removes the
llc from mixed llc/opt tests.
The file was modifiedllvm/test/Transforms/HardwareLoops/ARM/structure.ll
Commit 372eb2bbb6fb903ce76266e659dfefbaee67722b by
[ARM] Add low overhead loops terminators to AnalyzeBranch

This treats low overhead loop branches the same as jump tables and
indirect branches in analyzeBranch - they cannot be analyzed but the
direct branches on the end of the block may be removed. This helps
remove the unnecessary branches earlier, which can help produce better
codegen (and change block layout in a number of cases).

Differential Revision:
The file was modifiedllvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-gather-increment.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vldshuffle.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-gather-scatter-optimisation.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/while-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/sibling-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-scatter-increment.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-float32regloops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-gather-tailpred.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-float16regloops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
The file was modifiedllvm/lib/Target/ARM/ARMBaseInstrInfo.h
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll
Commit 63bedc80da36cf5eb71b06b453c186e057607bf4 by Dávid Bolvanský
[InstSimplify] Handle commutativity for 'and' and 'outer or' for (~A & B) | ~(A | B) --> ~A

Reviewed By: lebedev.ri

Differential Revision:
The file was modifiedllvm/test/Transforms/InstSimplify/or.ll
The file was modifiedllvm/lib/Analysis/InstructionSimplify.cpp
Commit 48dbac5b6b0bc7a03e9af42cb99176abba8d0467 by spatel
[SLP] remove unnecessary use of 'OperationData'

This is another NFC-intended patch to allow matching
intrinsics (example: maxnum) as candidates for reductions.

It's possible that the loop/if logic can be reduced now,
but it's still difficult to understand how this all works.
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit fcfcc3cc6b16e4fd7d7d2d07937634cca360b46e by spatel
[SLP] fix typos; NFC
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit 49b96cd9ef2f81d193641796b8a85781292faf7a by spatel
[SLP] remove opcode field from reduction data class

This is NFC-intended and another step towards supporting
intrinsics as reduction candidates.

The remaining bits of the OperationData class do not make
much sense as-is, so I will try to improve that, but I'm
trying to take minimal steps because it's still not clear
how this was intended to work.
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit ed939f853da1f2266f00ea087f778fda88848f73 by tianshilei1992
[OpenMP] Added the support for hidden helper task in RTL

The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision:
The file was modifiedopenmp/runtime/src/kmp_runtime.cpp
The file was modifiedopenmp/runtime/src/z_Linux_util.cpp
The file was modifiedopenmp/runtime/src/kmp_settings.cpp
The file was modifiedopenmp/runtime/src/kmp_taskdeps.h
The file was addedopenmp/runtime/test/tasking/hidden_helper_task/taskgroup.cpp
The file was addedopenmp/runtime/test/tasking/hidden_helper_task/common.h
The file was modifiedopenmp/runtime/src/kmp.h
The file was modifiedopenmp/runtime/src/kmp_tasking.cpp
The file was addedopenmp/runtime/test/tasking/hidden_helper_task/depend.cpp
The file was addedopenmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
The file was modifiedopenmp/runtime/test/worksharing/for/kmp_sch_simd_guided.c
The file was modifiedopenmp/runtime/src/kmp_wait_release.h
The file was modifiedopenmp/runtime/src/kmp_global.cpp
Commit d8fc27301d18f0935ba99ead7ac61aa6a53f16e4 by ajcbik
[mlir][sparse] improved sparse runtime support library

Added the ability to read (an extended version of) the FROSTT
file format, so that we can now read in sparse tensors of arbitrary
rank. Generalized the API to deal with more than two dimensions.

Also added the ability to sort the indices of sparse tensors
lexicographically. This is an important step towards supporting
auto gen of initialization code, since sparse storage formats
are easier to initialize if the indices are sorted. Since most
external formats don't enforce such properties, it is convenient
to have this ability in our runtime support library.

Lastly, the re-entrant problem of the original implementation
is fixed by passing an opaque object around (rather than having
a single static variable, ugh!).

Reviewed By: nicolasvasilache

Differential Revision:
The file was addedmlir/integration_test/Sparse/CPU/frostt-example.mlir
The file was modifiedmlir/integration_test/CMakeLists.txt
The file was modifiedmlir/lib/ExecutionEngine/SparseUtils.cpp
The file was addedmlir/integration_test/data/test.tns
The file was modifiedmlir/include/mlir/ExecutionEngine/CRunnerUtils.h
The file was modifiedmlir/integration_test/Sparse/CPU/matrix-market-example.mlir
Commit bfd75bdf3fd62d4f5e7028d4122f9ffa517f2a09 by Dávid Bolvanský
[NFC] Removed extra text in comments
The file was modifiedllvm/lib/Analysis/InstructionSimplify.cpp