Skip to main content

float_to_fixed_point

Function float_to_fixed_point 

Source
fn float_to_fixed_point(n: f64) -> i128
Expand description

Maps a finite f64 onto the fixed-point i128 domain used to accumulate float sums, i.e. computes trunc(n * FLOAT_SCALE) reduced modulo 2^128.

Conceptually this multiplies n by FLOAT_SCALE and truncates towards zero, but it does so using wrapping (modulo 2^128) rather than saturating semantics, and it never forms the intermediate product n * FLOAT_SCALE as an f64 (which could itself overflow to infinity for very large n).

Wrapping is what makes this conversion a group homomorphism into the additive group of i128 (mod 2^128), matching the wrapping arithmetic used when accumulators are combined and retracted. As a result, a set of large finite values whose sum is representable produces the correct result even when the individual values fall outside the representable fixed-point range. Saturating instead breaks this: e.g. 1.1e31 and -1.1e31 both overflow the domain and would saturate to i128::MAX and i128::MIN, which sum to -1 rather than 0 (see database-issues#11265).