SuccessChanges

Summary

  1. Updated synopsis of <atomic> to match what is implemented (details)
  2. AMDGPU: Change internal tracking of wave size (details)
  3. [LiveDebugValues] Remove early-exit when testing regmasks, NFC (details)
  4. [AArch64] Fix CollectLOH creating an AdrpAdd LOH when there's a live used reg (details)
  5. [AArch64][GlobalISel] Split G_GLOBAL_VALUE into ADRP + G_ADD_LOW and optimize. (details)
  6. [docs] Sketch outline for HowToUpdateDebugInfo.rst (details)
  7. [os_log][test] Remove -O1 from a test, NFC (details)
  8. Fix UB in EmulateInstructionARM64.cpp (details)
  9. [COFF] Free some memory used for chunks (details)
  10. Fix how cc1 command line options are mapped into FP options. (details)
Commit 06aaf0b3431f29b6debbb96fdd92ada896f336ff by ogiroux
Updated synopsis of <atomic> to match what is implemented
The file was modifiedlibcxx/include/atomic (diff)
Commit a8f720925599f8e44366438f1ccb4b4e9d9375ae by Matthew.Arsenault
AMDGPU: Change internal tracking of wave size

Store the log2 wave size instead of forcing division and log2
operations when querying either.
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUFeatures.td (diff)
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (diff)
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUSubtarget.h (diff)
Commit 776708b00bddb01f91b8d44f6853770966d335a5 by Vedant Kumar
[LiveDebugValues] Remove early-exit when testing regmasks, NFC

In transferRegisterDef, if the instruction has a regmask attached, we'll
check if any currently used register is clobbered by the regmask.

The early exit in this scan isn't necessary, costs a set lookup, and is
almost never taken [1]. Delete it.

[1]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp.html#L1136
The file was modifiedllvm/lib/CodeGen/LiveDebugValues.cpp (diff)
Commit 19ff00dab875d6184618c756df01b57acb908e82 by Amara Emerson
[AArch64] Fix CollectLOH creating an AdrpAdd LOH when there's a live used reg
between the two instructions.

If there's a pattern like:
$xA = ADRP foo @PAGE
[some killing use of reg Xb]
$Xb = ADDXri $Xa, 0, @PAGEOFF

CollectLOH would create an AdrpAdd LOH that resulted in the linker optimizing
this sequence into:
$xB = ADR foo
[some killing use of reg $Xb]
... and therefore clobbers the live $Xb register that was used by the
instruction in between.

This was discovered by a GlobalISel patch D78465 which broke up global variable
accesses into two pseudos, which in some cases could be moved apart.

Differential Revision: https://reviews.llvm.org/D80834
The file was addedllvm/test/CodeGen/AArch64/loh-use-between-adrp-add.mir
The file was modifiedllvm/lib/Target/AArch64/AArch64CollectLOH.cpp (diff)
Commit f573d489b6fccca85e0f2b3765aa17a364a4b0a8 by Amara Emerson
[AArch64][GlobalISel] Split G_GLOBAL_VALUE into ADRP + G_ADD_LOW and optimize.

The concept of G_GLOBAL_VALUE is nice and simple, but always using it as the
representation for global var addressing until selection time creates some
problems in optimizing accesses in certain code/relocation models.

The problem comes from trying to optimize adrp -> add -> load/store sequences
in the most common "small" code model. These accesses can be optimized into an
adrp -> load with the add offset being folded into the load's immediate field.
If we try to keep all global var references as a single generic instruction
then by the time we get to the complex operand trying to match these, we end up
generating an adrp at the point of use. The real issue here is that we don't
have any form of CSE during selection, so the code size will bloat from many
redundant adrp's.

This patch custom legalizes small code mode non-GOT G_GLOBALs into target ADRP
and a new "target specific generic opcode" G_ADD_LOW. We also teach the
localizer to localize these instructions via the custom hook that was added
recently. Finally, the complex pattern for indexed loads/stores is extended to
try to fold these G_ADD_LOW instructions into the load immediate.

On -O0 CTMark, we see a 0.8% geomean code size improvement. We should also see
some minor performance improvements too.

Differential Revision: https://reviews.llvm.org/D78465
The file was modifiedllvm/lib/Target/AArch64/AArch64InstructionSelector.cpp (diff)
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll (diff)
The file was modifiedllvm/test/CodeGen/AArch64/arm64-ldxr-stxr.ll (diff)
The file was addedllvm/lib/Target/AArch64/AArch64InstrGISel.td
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-constant.mir (diff)
The file was modifiedllvm/lib/Target/AArch64/AArch64InstrInfo.td (diff)
The file was modifiedllvm/test/CodeGen/AArch64/arm64-custom-call-saved-reg.ll (diff)
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (diff)
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp (diff)
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-blockaddress.mir (diff)
The file was addedllvm/test/CodeGen/AArch64/GlobalISel/legalize-global.mir
The file was modifiedllvm/lib/Target/AArch64/AArch64LegalizerInfo.cpp (diff)
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/localizer.mir (diff)
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/combine-ext-debugloc.mir (diff)
The file was modifiedllvm/lib/Target/AArch64/AArch64LegalizerInfo.h (diff)
The file was modifiedllvm/test/CodeGen/AArch64/dllimport.ll (diff)
Commit b429a0fef047867255e9cb65379677b2af7bb61b by Vedant Kumar
[docs] Sketch outline for HowToUpdateDebugInfo.rst

Summary:
Sketch the outline for a new document that explains how to update debug
info in various kinds of code transformations.

Some of the guidelines that belong in HowToUpdateDebugInfo.rst were in
SourceLevelDebugging.rst already under the debugify section. It seems
like the distinction between the two docs ought to be that the former is
more prescriptive, while the latter is more descriptive.

To that end I've consolidated the "how to update debug info" guidelines
which were in SourceLevelDebugging.rst into the new doc, along with the
information about using "debugify" to test transformations. Since we've
added a mir-debugify pass, I've described that as well.

Reviewers: aprantl, jmorse, chrisjackson, dsanders

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80052
The file was modifiedllvm/docs/UserGuides.rst (diff)
The file was addedllvm/docs/HowToUpdateDebugInfo.rst
The file was modifiedllvm/docs/SourceLevelDebugging.rst (diff)
Commit a66e1d2aa943959e158821be8956109cb5ef3b3b by Vedant Kumar
[os_log][test] Remove -O1 from a test, NFC
The file was modifiedclang/test/CodeGenObjCXX/os_log.mm (diff)
Commit a0b674fd7f06b86241cf19387313b508248a3868 by Adrian Prantl
Fix UB in EmulateInstructionARM64.cpp

This fixes an unhandled signed integer overflow in AddWithCarry() by
using the llvm::checkedAdd() function. Thats to Vedant Kumar for the
suggestion!

<rdar://problem/60926115>

Differential Revision: https://reviews.llvm.org/D80955
The file was modifiedlldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp (diff)
The file was addedlldb/unittests/Instruction/CMakeLists.txt
The file was modifiedlldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.h (diff)
The file was modifiedlldb/unittests/CMakeLists.txt (diff)
The file was addedlldb/unittests/Instruction/TestAArch64Emulator.cpp
Commit 11d1aa0bcc1197f1b3010171b02c6e9662f34b75 by rnk
[COFF] Free some memory used for chunks

First, do not reserve numSections in the Chunks array. In cases where
there are many non-prevailing sections, this will overallocate memory
which will not be used.

Second, free the memory for sparseChunks after initializeSymbols. After
that, it is never used.

This saves 50MB of 627MB for my use case without affecting performance.
The file was modifiedlld/COFF/InputFiles.cpp (diff)
The file was modifiedlld/COFF/InputFiles.h (diff)
Commit 8a8d703be0986dd6785cba0b610c9c4708b83e89 by rjmccall
Fix how cc1 command line options are mapped into FP options.

Canonicalize on storing FP options in LangOptions instead of
redundantly in CodeGenOptions.  Incorporate -ffast-math directly
into the values of those LangOptions rather than considering it
separately when building FPOptions.  Build IR attributes from
those options rather than a mix of sources.

We should really simplify the driver/cc1 interaction here and have
the driver pass down options that cc1 directly honors.  That can
happen in a follow-up, though.

Patch by Michele Scandale!
https://reviews.llvm.org/D80315
The file was addedclang/test/CodeGen/fp-options-to-fast-math-flags.c
The file was modifiedclang/test/CodeGenCUDA/library-builtin.cu (diff)
The file was modifiedclang/lib/CodeGen/CGExprScalar.cpp (diff)
The file was modifiedclang/include/clang/Basic/CodeGenOptions.def (diff)
The file was modifiedclang/lib/Frontend/CompilerInvocation.cpp (diff)
The file was modifiedclang/test/CodeGenOpenCL/relaxed-fpmath.cl (diff)
The file was modifiedclang/lib/CodeGen/CodeGenFunction.h (diff)
The file was modifiedclang/lib/CodeGen/BackendUtil.cpp (diff)
The file was modifiedclang/test/CodeGen/builtins-nvptx-ptx60.cu (diff)
The file was modifiedclang/test/CodeGen/libcalls.c (diff)
The file was modifiedclang/lib/CodeGen/CGCall.cpp (diff)
The file was modifiedclang/test/CodeGenCUDA/builtins-amdgcn.cu (diff)
The file was modifiedclang/test/CodeGen/complex-math.c (diff)
The file was modifiedclang/include/clang/Basic/LangOptions.h (diff)
The file was modifiedclang/lib/CodeGen/CodeGenFunction.cpp (diff)