Documents
Summary
The optimization target for the project is the X25519
cryptographic function implemented as plainly as possible in C with
the _BitInt(255)
type provided
by the upcoming C23 standard.
Baseline
Benchmarks against the widely-deployed and well-optimized libsodium
implementation
of X25519 show a massive performance gap. Clang 17.0.3 out-of-the-box
produces code that takes over 127x that of libsodium
for a
single X25519 call.
Implementation | Time (ns) | Iterations | Cycles |
---|---|---|---|
Baseline | 4682.75 | 8962 | 15771906.96 |
libsodium |
36.80 | 1141472 | 124794.13 |
This result suggests there may be some low hanging fruit, but we also think there's a lot of work to be done!
Crandall Reduction
The first optimization to be implemented and evaluated is Crandall
reduction, which exploits a trick for modular reduction for moduli of
the form 2^n - c
for some small c
. If our
modulus m
has this form then 2^n = c (mod m)
.
Therefore if a value x
to be reduced has low
n
bits l
and high bits h
,
then we have that x = c*h + l (mod m)
.
The CrandallReductionPass
detects this case and
optimizes it for large integer types observed in cryptographic code like
X25519. Despite its simplicity, it performs remarkably well on our
benchmark, giving a massive 24x speedup.
Implementation | Time (ns) | Iterations | Cycles |
---|---|---|---|
Baseline | 4682.75 | 8962 | 15771906.96 |
Crandall Single Step | 195.29 | 215216 | 662345.84 |
Reduction Analysis
The ReductionAnalysis
pass identifies self-contained
arithmetic expression graphs which are guaranteed to be reduced modulo
the same modulus. The purpose of this stage is to detect when intermediate
reductions may not be necessary, and could be safely rewritten to use
partially non-reduced forms.
The analysis is implemented as a worklist algorithm tracking lattice
states for each value: UNDEF
for no information,
REDUCED(M)
if all observed uses are also reduced modulo
M
and only involved in instructions that preserve the
reduction property, and NOTREDUCED
otherwise. The
REDUCED(M)
state is generated for operands of
urem
instructions by constant moduli, and propagated by
instructions that preserve modular arithmetic.
Incomplete Reduction
Incomplete reduction utilizes the prior techniques to eliminate
unnecessary urem
instructions. Reduction analysis results
allow us to determine when urem
instructions are candidates
for elimination, and instead Crandall reduction techniques can be used
to leave it in an incompletely reduced form instead.
The CrandallReductionPass
was extended to implement the
incomplete reduction optimization, and now operates in three phases:
- Plan Reductions
- Determine how to handle each modular reduction in the function: what bound to reduce it to, which Crandall parameters to apply, and whether the reduction should be left incomplete.
- Plan Ranges
- Given a set of reductions which will be left incomplete, determine the possible range of values that every value in the transformed expression graph will now have. These ranges are used to determine new types.
- Rewriting
- Given reductions to be rewritten, and new types for the expression graph, actually replace the expression graph with a parallel rewritten version. Delete the old expression graph and fix up φ nodes.
Incomplete reduction gives an additional 2.1x performance boost when applied to a basic block, and the more advanced version handling φ nodes provided 3.3x.
Implementation | Time (ns) | Iterations | Cycles |
---|---|---|---|
Crandall Single Step | 195.29 | 215216 | 662345.84 |
Incomplete Reduction | 93.11 | 451088 | 315782.34 |
Incomplete Reduction over φ | 58.46 | 718515 | 198282.05 |
Results
The combined results of these optimizations are effective, giving an 80.1x speedup over baseline. The end result is only 1.6x away from hand-tuned.Implementation | Time (ns) | Iterations | Cycles |
---|---|---|---|
Baseline | 4682.75 | 8962 | 15771906.96 |
Crandall Single Step | 195.29 | 215216 | 662345.84 |
Incomplete Reduction | 93.11 | 451088 | 315782.34 |
Incomplete Reduction over φ | 58.46 | 718515 | 198282.05 |
libsodium |
36.80 | 1141472 | 124794.13 |