Revision
359793
by spatel:
[DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try) The original patch was committed at rL359398 and reverted at rL359695 because of infinite looping. This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop. Original commit message: This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 |
Change Type | Path in Repository | Path in Workspace |
---|
 | /llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp |
 | /llvm/trunk/test/CodeGen/X86/fdiv-combine-vec.ll | trunk/test/CodeGen/X86/fdiv-combine-vec.ll |
Revision
359791
by spatel:
[SelectionDAG] remove constant folding limitations based on FP exceptions We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 |
Change Type | Path in Repository | Path in Workspace |
---|
 | /llvm/trunk/include/llvm/CodeGen/TargetLowering.h | trunk/include/llvm/CodeGen/TargetLowering.h |
 | /llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp |
 | /llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp | trunk/lib/CodeGen/TargetLoweringBase.cpp |
 | /llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp | trunk/lib/Target/AMDGPU/SIISelLowering.cpp |
 | /llvm/trunk/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp | trunk/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp |
 | /llvm/trunk/test/CodeGen/AArch64/fp-const-fold.ll | trunk/test/CodeGen/AArch64/fp-const-fold.ll |
Revision
359786
by rksimon:
[X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 |
Change Type | Path in Repository | Path in Workspace |
---|
 | /llvm/trunk/lib/Target/X86/X86ISelLowering.cpp | trunk/lib/Target/X86/X86ISelLowering.cpp |
 | /llvm/trunk/test/CodeGen/X86/haddsub.ll | trunk/test/CodeGen/X86/haddsub.ll |
 | /llvm/trunk/test/CodeGen/X86/phaddsub-extract.ll | trunk/test/CodeGen/X86/phaddsub-extract.ll |
Revision
359782
by rksimon:
[X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI. Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 |
Change Type | Path in Repository | Path in Workspace |
---|
 | /llvm/trunk/lib/Target/X86/X86ISelLowering.cpp | trunk/lib/Target/X86/X86ISelLowering.cpp |
Revision
359781
by jhenderson:
[llvm-strip]Add --no-strip-all to disable --strip-all behaviour (including default stripping) If certain switches are not specified, llvm-strip behaves as if --strip-all were specified. This means that for testing, when we don't want the stripping behaviour, we have to specify one of these switches, which can be confusing. This change adds --no-strip-all to allow an alternative way of suppressing the default stripping, in a less confusing manner. Reviewed by: jakehehrlich, MaskRay Differential Revision: https://reviews.llvm.org/D61377 |
Change Type | Path in Repository | Path in Workspace |
---|
 | /llvm/trunk/test/tools/llvm-objcopy/ELF/basic-only-keep-debug.test | trunk/test/tools/llvm-objcopy/ELF/basic-only-keep-debug.test |
 | /llvm/trunk/test/tools/llvm-objcopy/ELF/dynsym-error-remove-strtab.test | trunk/test/tools/llvm-objcopy/ELF/dynsym-error-remove-strtab.test |
 | /llvm/trunk/test/tools/llvm-objcopy/ELF/no-strip-all.test | trunk/test/tools/llvm-objcopy/ELF/no-strip-all.test |
 | /llvm/trunk/test/tools/llvm-objcopy/ELF/reloc-error-remove-symtab.test | trunk/test/tools/llvm-objcopy/ELF/reloc-error-remove-symtab.test |
 | /llvm/trunk/test/tools/llvm-objcopy/ELF/remove-linked-section.test | trunk/test/tools/llvm-objcopy/ELF/remove-linked-section.test |
 | /llvm/trunk/test/tools/llvm-objcopy/ELF/symtab-error-on-remove-strtab.test | trunk/test/tools/llvm-objcopy/ELF/symtab-error-on-remove-strtab.test |
 | /llvm/trunk/test/tools/llvm-objcopy/ELF/symtab-link.test | trunk/test/tools/llvm-objcopy/ELF/symtab-link.test |
 | /llvm/trunk/tools/llvm-objcopy/CopyConfig.cpp | trunk/tools/llvm-objcopy/CopyConfig.cpp |
 | /llvm/trunk/tools/llvm-objcopy/StripOpts.td | trunk/tools/llvm-objcopy/StripOpts.td |