mirror of
https://github.com/JuliaLang/julia.git
synced 2026-05-28 03:10:33 +08:00
* implement faster floating-point `isless`
Previously `isless` relied on the C intrinsic `fpislt` in
`src/runtime_intrinsics.c`, while the new implementation in Julia
arguably generates better code, namely:
1. The NaN-check compiles to a single instruction + branch amenable
for branch prediction in arguably most usecases (i.e. comparing
non-NaN floats), thus speeding up execution.
2. The compiler now often manages to remove NaN-computation if the
embedding code has already proven the arguments to be non-NaN.
3. The actual operation compares both arguments as sign-magnitude
integers instead of case analysis based on the sign of one
argument. This symmetric treatment may generate vectorized
instructions for the sign-magnitude conversion depending on how the
arguments are layed out.
The actual behaviour of `isless` did not change and apart from the
Julia-specific NaN-handling (which may be up for debate) the resulting
total order corresponds to the IEEE-754 specified `totalOrder`.
While the new implementation no longer generates fully branchless code I
did not manage to construct a usecase where this was detrimental: the
saved work seems to outweight the potential cost of a branch
misprediction in all of my tests with various NaN-polluted data. Also
auto-vectorization was not effective on the previous `fpislt` either.
Quick benchmarks (AMD A10-7860K) on `sort`, avoiding the specialized
algorithm:
```julia
a = rand(1000);
@btime sort($a, lt=(a,b)->isless(a,b));
# before: 56.030 μs (1 allocation: 7.94 KiB)
# after: 40.853 μs (1 allocation: 7.94 KiB)
a = rand(1000000);
@btime sort($a, lt=(a,b)->isless(a,b));
# before: 159.499 ms (2 allocations: 7.63 MiB)
# after: 120.536 ms (2 allocations: 7.63 MiB)
a = [rand((rand(), NaN)) for _ in 1:1000000];
@btime sort($a, lt=(a,b)->isless(a,b));
# before: 111.925 ms (2 allocations: 7.63 MiB)
# after: 77.669 ms (2 allocations: 7.63 MiB)
```
* Remove old intrinsic fpslt code
Co-authored-by: Mustafa Mohamad <mus-m@outlook.com>
118 lines
3.3 KiB
YAML
118 lines
3.3 KiB
YAML
Language: Cpp
|
|
Standard: Cpp11
|
|
AccessModifierOffset: -4
|
|
AlignAfterOpenBracket: Align
|
|
AlignConsecutiveAssignments: false
|
|
AlignConsecutiveDeclarations: false
|
|
AlignConsecutiveMacros: false
|
|
AlignEscapedNewlines: Left
|
|
AlignOperands: true
|
|
AlignTrailingComments: false
|
|
AllowAllArgumentsOnNextLine: true
|
|
AllowAllConstructorInitializersOnNextLine: true
|
|
AllowAllParametersOfDeclarationOnNextLine: true
|
|
AllowShortBlocksOnASingleLine: Never
|
|
AllowShortCaseLabelsOnASingleLine: true
|
|
AllowShortFunctionsOnASingleLine: Inline
|
|
AllowShortIfStatementsOnASingleLine: Never
|
|
AllowShortLambdasOnASingleLine: All
|
|
AllowShortLoopsOnASingleLine: false
|
|
AlwaysBreakAfterDefinitionReturnType: None
|
|
AlwaysBreakAfterReturnType: None
|
|
AlwaysBreakBeforeMultilineStrings: false
|
|
AlwaysBreakTemplateDeclarations: true
|
|
BinPackArguments: true
|
|
BinPackParameters: true
|
|
BreakBeforeBinaryOperators: None
|
|
BreakBeforeBraces: Stroustrup
|
|
BreakBeforeInheritanceComma: false
|
|
BreakBeforeTernaryOperators: false
|
|
BreakConstructorInitializers: BeforeColon
|
|
BreakConstructorInitializersBeforeComma: false
|
|
BreakInheritanceList: BeforeColon
|
|
BreakStringLiterals: false
|
|
ColumnLimit: 92
|
|
CommentPragmas: '^ IWYU pragma:'
|
|
CompactNamespaces: false
|
|
ConstructorInitializerAllOnOneLineOrOnePerLine: true
|
|
ConstructorInitializerIndentWidth: 2
|
|
ContinuationIndentWidth: 4
|
|
Cpp11BracedListStyle: true
|
|
DeriveLineEnding: true
|
|
DerivePointerAlignment: false
|
|
DisableFormat: false
|
|
ExperimentalAutoDetectBinPacking: false
|
|
FixNamespaceComments: false
|
|
IncludeBlocks: Preserve
|
|
IncludeCategories:
|
|
- Regex: '^(<|"(llvm|llvm-c|clang|clang-c)/)'
|
|
Priority: 2
|
|
- Regex: '^<.*'
|
|
Priority: 3
|
|
- Regex: '.*'
|
|
Priority: 1
|
|
IncludeIsMainSourceRegex: ''
|
|
IndentCaseLabels: false
|
|
IndentGotoLabels: false
|
|
IndentPPDirectives: None
|
|
IndentWidth: 4
|
|
IndentWrappedFunctionNames: false
|
|
KeepEmptyLinesAtTheStartOfBlocks: false
|
|
MacroBlockBegin: ''
|
|
MacroBlockEnd: ''
|
|
MaxEmptyLinesToKeep: 2
|
|
NamespaceIndentation: None
|
|
PenaltyBreakAssignment: 2
|
|
PenaltyBreakBeforeFirstCallParameter: 30
|
|
PenaltyBreakComment: 300
|
|
PenaltyBreakFirstLessLess: 120
|
|
PenaltyBreakString: 1000
|
|
PenaltyBreakTemplateDeclaration: 10
|
|
PenaltyExcessCharacter: 1000000
|
|
PenaltyReturnTypeOnItsOwnLine: 60
|
|
PointerAlignment: Right
|
|
ReflowComments: true
|
|
SortIncludes: true
|
|
SortUsingDeclarations: true
|
|
SpaceAfterCStyleCast: false
|
|
SpaceAfterLogicalNot: false
|
|
SpaceAfterTemplateKeyword: false
|
|
SpaceBeforeAssignmentOperators: true
|
|
SpaceBeforeCpp11BracedList: false
|
|
SpaceBeforeCtorInitializerColon: true
|
|
SpaceBeforeInheritanceColon: true
|
|
SpaceBeforeParens: ControlStatements
|
|
SpaceBeforeRangeBasedForLoopColon: true
|
|
SpaceBeforeSquareBrackets: false
|
|
SpaceInEmptyBlock: false
|
|
SpaceInEmptyParentheses: false
|
|
SpacesBeforeTrailingComments: 1
|
|
SpacesInAngles: false
|
|
SpacesInCStyleCastParentheses: false
|
|
SpacesInConditionalStatement: false
|
|
SpacesInContainerLiterals: true
|
|
SpacesInParentheses: false
|
|
SpacesInSquareBrackets: false
|
|
TabWidth: 8
|
|
UseCRLF: true
|
|
UseTab: Never
|
|
ForEachMacros:
|
|
- JL_TRY
|
|
- JL_CATCH
|
|
StatementMacros:
|
|
- bi_fintrinsic
|
|
- bi_iintrinsic_fast
|
|
- bi_intrinsic_ctype
|
|
- bool_fintrinsic
|
|
- bool_iintrinsic_fast
|
|
- bool_intrinsic_ctype
|
|
- checked_intrinsic_ctype
|
|
- cvt_iintrinsic
|
|
- fpiseq_n
|
|
- ter_fintrinsic
|
|
- ter_intrinsic_ctype
|
|
- un_fintrinsic
|
|
- un_fintrinsic_withtype
|
|
- un_iintrinsic_ctype
|
|
- uu_iintrinsic_ctype
|