SuccessChanges

Summary

  1. [llvm-objdump] Symbolize binary addresses for low-noisy asm diff. (details)
  2. [MLInliner] In development mode, obtain the output specs from a file (details)
Commit 819b2d9c7901d8e6a1434374701e22e15f9cd640 by hoy
[llvm-objdump] Symbolize binary addresses for low-noisy asm diff.

When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.

So far only the X86 disassemblers are supported.

Test Plan:

llvm-objdump -d  --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               push rax
               mov dword ptr [rsp + 4], 0
               mov dword ptr [rsp], 0
               mov eax, dword ptr [rsp]
               cmp eax, dword ptr [rip + 4112]  # 202182 <g>
               jge 0x20117e <_start+0x25>
               call 0x201158 <foo>
               inc dword ptr [rsp]
               jmp 0x201169 <_start+0x10>
               xor eax, eax
               pop rcx
               ret
```

llvm-objdump -d  **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               push rax
               mov dword ptr [rsp + 4], 0
               mov dword ptr [rsp], 0
<L1>:
               mov eax, dword ptr [rsp]
               cmp eax, dword ptr  <g>
               jge <L0>
               call <foo>
               inc dword ptr [rsp]
               jmp <L1>
<L0>:
               xor eax, eax
               pop rcx
               ret
```

Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D84191
The file was modifiedllvm/tools/llvm-objdump/llvm-objdump.cpp (diff)
The file was modifiedllvm/lib/Target/X86/MCTargetDesc/X86IntelInstPrinter.cpp (diff)
The file was modifiedllvm/include/llvm/MC/MCInstPrinter.h (diff)
The file was addedllvm/test/tools/llvm-objdump/X86/elf-disassemble-symbololize-operands.yaml
The file was modifiedllvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp (diff)
The file was modifiedllvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp (diff)
The file was modifiedllvm/docs/CommandGuide/llvm-objdump.rst (diff)
Commit 62fc44ca3cf66442b30e22b1be34afc492a2a388 by mtrofin
[MLInliner] In development mode, obtain the output specs from a file

Different training algorithms may produce models that, besides the main
policy output (i.e. inline/don't inline), produce additional outputs
that are necessary for the next training stage. To facilitate this, in
development mode, we require the training policy infrastructure produce
a description of the outputs that are interesting to it, in the form of
a JSON file. We special-case the first entry in the JSON file as the
inlining decision - we care about its value, so we can guide inlining
during training - but treat the rest as opaque data that we just copy
over to the training log.

Differential Revision: https://reviews.llvm.org/D85674
The file was modifiedllvm/lib/Analysis/DevelopmentModeInlineAdvisor.cpp (diff)
The file was addedllvm/test/Transforms/Inline/ML/Inputs/test_output_spec.json
The file was modifiedllvm/test/Transforms/Inline/ML/development-training-log.ll (diff)
The file was addedllvm/lib/Analysis/models/inliner/output_spec.json