Who am I?

  • Senior Software Engineer at Google
  • DC efficiency

Agenda for today

  • History of sorting
  • Why have we decided to change anything?
  • Bugs, bugs, bugs
  • Implementation
  • What can you do?

Reminders

  • Sorting is the ordering of elements
  • std::sort, std::stable_sort, ranges::sort, etc

Reminders

  • Sorting is the ordering of elements
  • std::sort, std::stable_sort, ranges::sort, etc

Quicksort

  • Quick sort
  • Take any element
  • Less to the left
  • More to the right
  • Recurse on both parts
  • Quick sort (STL version)
  • Quick sort (STL version)
  • Quick sort (STL version)
  • Quick sort (STL version)

Standard library evolved from STL

std::sort

Quicksort's worst case is $O(n^2)$

Just pick the worst element

GCC libstdc++

GCC
Original

Introsort

  • Almost like a quick sort
  • If we suspect a lot of recursion calls, use heap sort
  • GCC libstdc++ uses $2 \log_2 n$ depth limit
  • Microsoft standard library does the same

Introsort (worst case)

LLVM libc++ did not do anything at all for a long time

7 years

Only 0.01% of all calls got into the heap sort fallback in production

LLVM (history)

What is a good sort?
  • $O(n\log n)$ comparisons
  • Recognizes almost sorted patterns
  • Fast for modern hardware
  • Fewer comparisons for heavy comparison sorting
libcxx on ascending
libstdcxx on ascending
How libcxx has achieved that? Insertion sort on every step up to 8 insertions.
Overall every now and then we have a better sorting

Let's change and save some cycles

We tried

Problem: ties

Problem: ties

Problem: ties

It took a year to fix all golden tests

Problem: ties. How

Shuffle the range before

Problem: ties. How

Address space layout randomization

Randomization went beyond std::sort

Also to std::nth_element, std::partial_sort

std::nth_element, std::partial_sort

Randomize before and randomize after the undefined ranges

std::nth_element
std::partial_sort

Bugs

Determinism

Example: ClickHouse

Determinism

  • Assume you sort some data
  • Cache it
  • Now ties are handled randomly
  • Lower hit rate

Determinism

Use std::stable_sort
Let's write a median function
Before that you might get lucky because nth_element puts elements around nth close to total order
We found so many of them!
Use _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY with libcxx
Strict weak ordering
  • Irreflexivity: $x < x$ is false
  • Asymmetry: $x < y$ and $y < x$ cannot be both true
  • Transitivity: $x < y$ and $y < z$ imply $x < z$
  • Transitivity of incomparability: $x == y$ and $y == z$ imply $x == z$, where $x == y$ means $x < y$ and $y < x$ are both false.
To check this we need to check all triples of elements, it's $O(n^3)$

But what are the risks?

On paper, UB. But in practice?

https://gcc.godbolt.org/z/c71qzM97f

https://gcc.godbolt.org/z/qj7EeMohP

Risks

  • Incorrect comparator for $\leq 30$ elements, all good
  • Incorrect comparator for $> 30$ elements, can fail with OOB
  • Have you written a well thought test for 30+ elements?
At Google we consistently (once in months) have outage somewhere because of that
OOB read
To check this we need to check all triples of elements, it's $O(n^3)$, sorting is $O(n \log n)$.
There is a $O(n^2)$ algorithm to check for strict weak ordering

$O(n^2)$ is still a lot but we can use it for, say, 100 first elements

Most sorts are small!

Caveats

  • Sort should not fail
  • It only tells if strict weak ordering is met
  • Arbitrary comparator can do anything
  • We can check we don't read OOB in std::sort itself

https://discourse.llvm.org/t/rfc-strict-weak-ordering-checks-in-the-debug-libc/70217

OOB check in 36d8b4

Submitted SWO checks for std::{stable_,partial_}sort and rolled out in LLVM 17 in 7e1ee1

-D_LIBCPP_DEBUG_STRICT_WEAK_ORDERING_CHECK -D_LIBCPP_ENABLE_DEBUG_MODE=1

It took us around 6-7 months to fix everything

Bugs

We found fewer of them but more dangerous.

($a$ is equal to $a + \epsilon$), ($a+\epsilon$ equals to $a+2\epsilon$) but ($a$ is not equal to $a + 2\epsilon$). Transitivity of incomparability is lost

libc++ std::nth_element is quadratic, should be linear on average #52747

Sorting trivia

Can we sort {1.02, 1.01, 1.0}?

Yes

Can we sort {3.0, NaN, 4.0}?

No

NaN thinks it is equal to 3.0 and 4.0

Can we sort {NaN, 3.0, NaN}?

Yes

All elements are equal

What are we changing?

  • After these cleanups we can change to any sort
  • We chose a mix of BlockQuickSort and pdqsort

Data dependent branches are hard to predict

Pack results into blocks
CTZ - Count Trailing Zeros, BLSR - Reset Lowest Set Bit
We also improved picking a pivot out of 9 elements. So called "Tukey's ninther"
https://godbolt.org/z/nrhT88MsM
                                   old          new
BM_Sort_uint64_Rand_1      4.26ns ± 0%  4.41ns ± 1%  3.44%
BM_Sort_uint64_Rand_4      1.96ns ± 0%  1.95ns ± 1%  -0.34%
BM_Sort_uint64_Rand_16     10.5ns ± 0%  11.6ns ± 1%  10.77%
BM_Sort_uint64_Rand_64     18.8ns ± 0%  17.5ns ± 1%  -6.60%
BM_Sort_uint64_Rand_256    26.4ns ± 0%  21.7ns ± 1%  -17.88%
BM_Sort_uint64_Rand_1024   33.8ns ± 1%  24.6ns ± 1%  -27.15%
BM_Sort_uint64_Rand_16384  48.7ns ± 0%  30.1ns ± 1%  -38.10%
BM_Sort_uint64_Rand_262144 63.4ns ± 1%  35.7ns ± 1%  -43.71%
On other patterns we sometimes regressed but by a little, hardest cases are random.
BM_Sort_string_Random_1      4.66ns ± 6%  4.40ns ± 4%    -5.55%
BM_Sort_string_Random_4      14.9ns ± 3%  15.0ns ± 6%      ~
BM_Sort_string_Random_16     45.5ns ± 6%  35.8ns ± 8%   -21.37%
BM_Sort_string_Random_64     66.6ns ± 4%  58.2ns ± 3%   -12.69%
BM_Sort_string_Random_256    86.0ns ± 5%  77.4ns ± 3%   -10.01%
BM_Sort_string_Random_1024   106ns ± 3%    96ns ± 6%    -9.39%
BM_Sort_string_Random_16384  154ns ± 3%   141ns ± 5%    -8.03%
BM_Sort_string_Random_262144 213ns ± 4%   197ns ± 4%    -7.59%

Some thoughts

  • Default for standard sorting is unstable
  • Python, Rust, Java have stable default sorts
  • Why do we have std::sort that can read OOB even if comparator is broken?
  • std::ranges::sort wanted to have 1 argument, found 0 bugs with different iterators in std::sort

Some more thoughts

  • std::weak_ordering in C++20 allows to make fewer bugs

Results

  • We called it a comparator sanitizer
  • We saw 7-8% perf improvement in production
  • Reduced branch mispredictions a lot
  • Fixed thousands of bugs
  • Fought the Hyrum's Law
  • Even simplest things are hard to do right
  • Things take time (2.5 years)

What can you do?

  • If you use libcxx, enable -D_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY -D_LIBCPP_DEBUG_STRICT_WEAK_ORDERING_CHECK -D_LIBCPP_ENABLE_DEBUG_MODE=1 in your debug builds
  • Probably randomization can be applied to std::partition, std::make_heap
  • SWO checks for std::nth_element, etc

What can you do?

  • Remember Bjarne talked about invalidation of vector iterators?
  • In debug mode reallocate on first 100 push_backs

Links and further