hdrhistogram

Module serialization

Source
Expand description

Serialization/deserialization support.

The upstream Java project has established several different types of serialization. We have currently implemented V2 and V2 + DEFLATE (following the names used by the Java implementation).

These formats are compact binary representations of the state of the histogram. They are intended to be used for archival or transmission to other systems for further analysis. A typical use case would be to periodically serialize a histogram, save it somewhere, and reset the histogram.

Histograms are designed to be added, subtracted, and otherwise manipulated, and an efficient storage format facilitates this. As an example, you might be capturing histograms once a minute to have a granular view into your performance over time, but you might also want to see longer trends over an hour or day. Simply deserialize the last 60 minutes worth to recreate their in-memory Histogram form, add them all together into one Histogram, and perform whatever calculations you wish on the resulting histogram. This would allow you to correctly calculate the 99.99th percentile for the entire hour, for instance, which is not something you can do if you have only stored percentiles (as opposed to the entire histogram) for each minute.

§Performance concerns

Serialization is quite fast; serializing a histogram in V2 format that represents 1 to u64::max_value() with 3 digits of precision with tens of thousands of recorded counts takes about 40 microseconds on an E5-1650v3 Xeon. Deserialization is about 3x slower, but that will improve as there are still some optimizations to perform.

For the V2 format, the space used for a histogram will depend mainly on precision since higher precision will reduce the extent to which different values are grouped into the same bucket. Having a large value range (e.g. 1 to u64::max_value()) will not directly impact the size if there are many zero counts as zeros are compressed away.

V2 + DEFLATE is significantly slower to serialize (around 10x) but only a little bit slower to deserialize (less than 2x). YMMV depending on the compressibility of your histogram data, the speed of the underlying storage medium, etc. Naturally, you can always compress at a later time: there’s no reason why you couldn’t serialize as V2 and then later re-serialize it as V2 + DEFLATE on another system (perhaps as a batch job) for better archival storage density.

§API

Each serialization format has its own serializer struct, but since each format is reliably distinguishable from each other, there is only one Deserializer struct that will work for any of the formats this library implements.

Serializers and deserializers are intended to be re-used for many histograms. You can use them for one histogram and throw them away; it will just be less efficient as the cost of their internal buffers will not be amortized across many histograms.

Serializers can write to any Write implementation, and Deserializer can read from any Read. This should make it easy to use them in almost any context, as everything from i/o streams to Vec<u8> can be a Read or Write.

§Interval logs

See the interval_log module.

§Integration with general-purpose serialization libraries

In general, serializing histograms should be straightforward: pick the serialization format that is suitable for your requirements (e.g. based on what formats are supported by other tools that will consume the serialized histograms) and use the corresponding struct.

However, there are some approaches to serialization like serde’s Serialize or rustc_serialize’s Encodable that effectively require that only one way of serialization can be used because a trait can only be implemented once for a struct. This is too restrictive for histograms since they inherently have multiple ways of being serialized, so as a library we cannot pick the format for you. If you need to interoperate with such a restriction, a good approach is to first pick your serialization format (V2, etc) like you normally would, then make a wrapper struct. The wrapper effectively gives you a struct whose sole opportunity to implement a trait you can expend to satisfy the way serde, etc, are structured.

Here’s a sketch of how that would look for serde’s Serialize:

use hdrhistogram::Histogram;
use hdrhistogram::serialization::{Serializer, V2Serializer};

mod serde {
    // part of serde, simplified
    pub trait Serializer {
       // ...
       fn serialize_bytes(self, value: &[u8]) -> Result<(), ()>;
       // ...
    }

    // also in serde
    pub trait Serialize {
        fn serialize<S: Serializer>(&self, serializer: S) -> Result<(), ()>;
    }
}

// your custom wrapper
#[allow(dead_code)] // to muffle warnings compiling this example
struct V2HistogramWrapper {
    histogram: Histogram<u64>
}

impl serde::Serialize for V2HistogramWrapper {
    fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<(), ()> {
        // Not optimal to not re-use the vec and serializer, but it'll work
        let mut vec = Vec::new();
        // Pick the serialization format you want to use. Here, we use plain V2, but V2 +
        // DEFLATE is also available.
        // Map errors as appropriate for your use case.
        V2Serializer::new().serialize(&self.histogram, &mut vec)
            .map_err(|_| ())?;
        serializer.serialize_bytes(&vec)?;
        Ok(())
    }
}

§Examples

Creating, serializing, and deserializing a single histogram using a Vec<u8> as a Write and a &[u8] slice from the vec as a Read.

use hdrhistogram::Histogram;
use hdrhistogram::serialization::{Deserializer, Serializer, V2Serializer};

let mut vec = Vec::new();
let orig_histogram = Histogram::<u64>::new(1).unwrap();
V2Serializer::new().serialize(&orig_histogram, &mut vec).unwrap();

let _histogram: Histogram<u64> = Deserializer::new()
    .deserialize(&mut vec.as_slice()).unwrap();

This example shows serializing several histograms into a Vec<u8> and deserializing them again, at which point they are summed into one histogram (for further hypothetical analysis).

use hdrhistogram::Histogram;
use hdrhistogram::serialization::{Deserializer, Serializer, V2Serializer};
use std::io::Cursor;

// Naturally, do real error handling instead of unwrap() everywhere

let num_histograms = 4;
let mut histograms = Vec::new();

// Make some histograms
for _ in 0..num_histograms {
    let mut h = Histogram::<u64>::new_with_bounds(1, u64::max_value(), 3).unwrap();
    h.record_n(42, 7).unwrap();
    histograms.push(h);
}

let mut buf = Vec::new();
let mut serializer = V2Serializer::new();

// Save them to the buffer
for h in histograms.iter() {
    serializer.serialize(h, &mut buf).unwrap();
}

// Read them back out again
let mut deserializer = Deserializer::new();
let mut cursor = Cursor::new(&buf);

let mut accumulator =
    Histogram::<u64>::new_with_bounds(1, u64::max_value(), 3).unwrap();

for _ in 0..num_histograms {
    let h: Histogram<u64> = deserializer.deserialize(&mut cursor).unwrap();

    // behold, they are restored as they were originally
    assert_eq!(7, h.count_at(42));
    assert_eq!(0, h.count_at(1000));

    accumulator.add(h).unwrap();
}

// all the counts are there
assert_eq!(num_histograms * 7, accumulator.count_at(42));

Modules§

Structs§

Enums§

Traits§