reveal.js

Who am I?

Senior Software Engineer at Google
DC efficiency

Agenda for today

History of sorting
Why have we decided to change anything?
Bugs, bugs, bugs
Implementation
What can you do?

Reminders

Sorting is the ordering of elements
std::sort, std::stable_sort, ranges::sort, etc

Reminders

Sorting is the ordering of elements
std::sort, std::stable_sort, ranges::sort, etc

Quicksort

Quick sort
Take any element
Less to the left
More to the right
Recurse on both parts

Quick sort (STL version)

Quick sort (STL version)

Quick sort (STL version)

Quick sort (STL version)

Standard library evolved from STL

std::sort

Quicksort's worst case is $O(n^2)$

Just pick the worst element

GCC libstdc++

GCC

Original

Introsort

Almost like a quick sort
If we suspect a lot of recursion calls, use heap sort
GCC libstdc++ uses $2 \log_2 n$ depth limit
Microsoft standard library does the same

Introsort (worst case)

LLVM libc++ did not do anything at all for a long time

7 years

Only 0.01% of all calls got into the heap sort fallback in production

LLVM (history)

What is a good sort?

$O(n\log n)$ comparisons
Recognizes almost sorted patterns
Fast for modern hardware
Fewer comparisons for heavy comparison sorting

libcxx on ascending

libstdcxx on ascending

How libcxx has achieved that? Insertion sort on every step up to 8 insertions.

Overall every now and then we have a better sorting

Let's change and save some cycles

We tried

Problem: ties

Problem: ties

It took a year to fix all golden tests

Problem: ties. How

Shuffle the range before

Problem: ties. How

Address space layout randomization

Randomization went beyond std::sort

Also to std::nth_element, std::partial_sort

std::nth_element, std::partial_sort

Randomize before and randomize after the undefined ranges

std::nth_element

std::partial_sort

Bugs

Determinism

Example: ClickHouse

Determinism

Assume you sort some data
Cache it
Now ties are handled randomly
Lower hit rate

Determinism

Use std::stable_sort

Let's write a median function

Before that you might get lucky because nth_element puts elements around nth close to total order

We found so many of them!

Use _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY with libcxx

Strict weak ordering

Irreflexivity: $x < x$ is false
Asymmetry: $x < y$ and $y < x$ cannot be both true
Transitivity: $x < y$ and $y < z$ imply $x < z$
Transitivity of incomparability: $x == y$ and $y == z$ imply $x == z$, where $x == y$ means $x < y$ and $y < x$ are both false.

To check this we need to check all triples of elements, it's $O(n^3)$

But what are the risks?

On paper, UB. But in practice?

https://gcc.godbolt.org/z/c71qzM97f

https://gcc.godbolt.org/z/qj7EeMohP

Risks

Incorrect comparator for $\leq 30$ elements, all good
Incorrect comparator for $> 30$ elements, can fail with OOB
Have you written a well thought test for 30+ elements?

At Google we consistently (once in months) have outage somewhere because of that

OOB read

To check this we need to check all triples of elements, it's $O(n^3)$, sorting is $O(n \log n)$.

There is a $O(n^2)$ algorithm to check for strict weak ordering

Sort the range, if not std::is_sorted, return false
Find prefix of all equivalent elements
Check they are equal
Check they are less than all next
Remove them
Proof can be read at https://github.com/danlark1/quadratic_strict_weak_ordering

$O(n^2)$ is still a lot but we can use it for, say, 100 first elements

Most sorts are small!

Caveats

Sort should not fail
It only tells if strict weak ordering is met
Arbitrary comparator can do anything
We can check we don't read OOB in std::sort itself

https://discourse.llvm.org/t/rfc-strict-weak-ordering-checks-in-the-debug-libc/70217

OOB check in 36d8b4

Submitted SWO checks for std::{stable_,partial_}sort and rolled out in LLVM 17 in 7e1ee1

-D_LIBCPP_DEBUG_STRICT_WEAK_ORDERING_CHECK -D_LIBCPP_ENABLE_DEBUG_MODE=1

It took us around 6-7 months to fix everything

Bugs

We found fewer of them but more dangerous.

($a$ is equal to $a + \epsilon$), ($a+\epsilon$ equals to $a+2\epsilon$) but ($a$ is not equal to $a + 2\epsilon$). Transitivity of incomparability is lost

libc++ std::nth_element is quadratic, should be linear on average #52747

Sorting trivia

Can we sort {1.02, 1.01, 1.0}?

Yes

Can we sort {3.0, NaN, 4.0}?

NaN thinks it is equal to 3.0 and 4.0

Can we sort {NaN, 3.0, NaN}?

Yes

All elements are equal

What are we changing?

After these cleanups we can change to any sort
We chose a mix of BlockQuickSort and pdqsort

Data dependent branches are hard to predict

Pack results into blocks

CTZ - Count Trailing Zeros, BLSR - Reset Lowest Set Bit

We also improved picking a pivot out of 9 elements. So called "Tukey's ninther"

https://godbolt.org/z/nrhT88MsM

                                   old          new
BM_Sort_uint64_Rand_1      4.26ns ± 0%  4.41ns ± 1%  3.44%
BM_Sort_uint64_Rand_4      1.96ns ± 0%  1.95ns ± 1%  -0.34%
BM_Sort_uint64_Rand_16     10.5ns ± 0%  11.6ns ± 1%  10.77%
BM_Sort_uint64_Rand_64     18.8ns ± 0%  17.5ns ± 1%  -6.60%
BM_Sort_uint64_Rand_256    26.4ns ± 0%  21.7ns ± 1%  -17.88%
BM_Sort_uint64_Rand_1024   33.8ns ± 1%  24.6ns ± 1%  -27.15%
BM_Sort_uint64_Rand_16384  48.7ns ± 0%  30.1ns ± 1%  -38.10%
BM_Sort_uint64_Rand_262144 63.4ns ± 1%  35.7ns ± 1%  -43.71%

On other patterns we sometimes regressed but by a little, hardest cases are random.

BM_Sort_string_Random_1      4.66ns ± 6%  4.40ns ± 4%    -5.55%
BM_Sort_string_Random_4      14.9ns ± 3%  15.0ns ± 6%      ~
BM_Sort_string_Random_16     45.5ns ± 6%  35.8ns ± 8%   -21.37%
BM_Sort_string_Random_64     66.6ns ± 4%  58.2ns ± 3%   -12.69%
BM_Sort_string_Random_256    86.0ns ± 5%  77.4ns ± 3%   -10.01%
BM_Sort_string_Random_1024   106ns ± 3%    96ns ± 6%    -9.39%
BM_Sort_string_Random_16384  154ns ± 3%   141ns ± 5%    -8.03%
BM_Sort_string_Random_262144 213ns ± 4%   197ns ± 4%    -7.59%

Some thoughts

Default for standard sorting is unstable
Python, Rust, Java have stable default sorts
Why do we have std::sort that can read OOB even if comparator is broken?
std::ranges::sort wanted to have 1 argument, found 0 bugs with different iterators in std::sort

Some more thoughts

std::weak_ordering in C++20 allows to make fewer bugs

Results

We called it a comparator sanitizer
We saw 7-8% perf improvement in production
Reduced branch mispredictions a lot
Fixed thousands of bugs
Fought the Hyrum's Law
Even simplest things are hard to do right
Things take time (2.5 years)

What can you do?

If you use libcxx, enable -D_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY -D_LIBCPP_DEBUG_STRICT_WEAK_ORDERING_CHECK -D_LIBCPP_ENABLE_DEBUG_MODE=1 in your debug builds
Probably randomization can be applied to std::partition, std::make_heap
SWO checks for std::nth_element, etc

What can you do?

Remember Bjarne talked about invalidation of vector iterators?
In debug mode reallocate on first 100 push_backs

Links and further

Personal Email: kutdanila@gmail.com
Work Email: danilak@google.com
Github: @Danlark1
Telegram: @Danlark