hdrhistogram/serialization/mod.rs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226
//! Serialization/deserialization support.
//!
//! The upstream Java project has established several different types of serialization. We have
//! currently implemented V2 and V2 + DEFLATE (following the names used by the Java implementation).
//!
//! These formats are compact binary representations of the state of the histogram. They are
//! intended to be used for archival or transmission to other systems for further analysis. A
//! typical use case would be to periodically serialize a histogram, save it somewhere, and reset
//! the histogram.
//!
//! Histograms are designed to be added, subtracted, and otherwise manipulated, and an efficient
//! storage format facilitates this. As an example, you might be capturing histograms once a minute
//! to have a granular view into your performance over time, but you might also want to see longer
//! trends over an hour or day. Simply deserialize the last 60 minutes worth to recreate their
//! in-memory `Histogram` form, add them all together into one `Histogram`, and perform whatever
//! calculations you wish on the resulting histogram. This would allow you to correctly calculate
//! the 99.99th percentile for the entire hour, for instance, which is not something you can do
//! if you have only stored percentiles (as opposed to the entire histogram) for each minute.
//!
//! # Performance concerns
//!
//! Serialization is quite fast; serializing a histogram in V2 format that represents 1 to
//! `u64::max_value()` with 3 digits of precision with tens of thousands of recorded counts takes
//! about 40 microseconds on an E5-1650v3 Xeon. Deserialization is about 3x slower, but that will
//! improve as there are still some optimizations to perform.
//!
//! For the V2 format, the space used for a histogram will depend mainly on precision since higher
//! precision will reduce the extent to which different values are grouped into the same bucket.
//! Having a large value range (e.g. 1 to `u64::max_value()`) will not directly impact the size if
//! there are many zero counts as zeros are compressed away.
//!
//! V2 + DEFLATE is significantly slower to serialize (around 10x) but only a little bit slower to
//! deserialize (less than 2x). YMMV depending on the compressibility of your histogram data, the
//! speed of the underlying storage medium, etc. Naturally, you can always compress at a later time:
//! there's no reason why you couldn't serialize as V2 and then later re-serialize it as V2 +
//! DEFLATE on another system (perhaps as a batch job) for better archival storage density.
//!
//! # API
//!
//! Each serialization format has its own serializer struct, but since each format is reliably
//! distinguishable from each other, there is only one `Deserializer` struct that will work for
//! any of the formats this library implements.
//!
//! Serializers and deserializers are intended to be re-used for many histograms. You can use them
//! for one histogram and throw them away; it will just be less efficient as the cost of their
//! internal buffers will not be amortized across many histograms.
//!
//! Serializers can write to any `Write` implementation, and `Deserializer` can read from any
//! `Read`. This should make it easy to use them in almost any context, as everything from i/o
//! streams to `Vec<u8>` can be a `Read` or `Write`.
//!
//! # Interval logs
//!
//! See the `interval_log` module.
//!
//! ### Integration with general-purpose serialization libraries
//!
//! In general, serializing histograms should be straightforward: pick the serialization format
//! that is suitable for your requirements (e.g. based on what formats are supported by other tools
//! that will consume the serialized histograms) and use the corresponding struct.
//!
//! However, there are some approaches to serialization like [serde's
//! `Serialize`](https://docs.serde.rs/serde/trait.Serialize.html) or [`rustc_serialize`'s
//! `Encodable`](https://doc.rust-lang.org/rustc-serialize/rustc_serialize/trait.Encodable.html)
//! that effectively require that only one way of serialization can be used because a trait can
//! only be implemented once for a struct. This is too restrictive for histograms since they
//! inherently have multiple ways of being serialized, so as a library we cannot pick the format
//! for you. If you need to interoperate with such a restriction, a good approach is to first pick
//! your serialization format (V2, etc) like you normally would, then make a wrapper struct. The
//! wrapper effectively gives you a struct whose sole opportunity to implement a trait you can
//! expend to satisfy the way serde, etc, are structured.
//!
//! Here's a sketch of how that would look for serde's `Serialize`:
//!
//! ```
//! use hdrhistogram::Histogram;
//! use hdrhistogram::serialization::{Serializer, V2Serializer};
//!
//! mod serde {
//! // part of serde, simplified
//! pub trait Serializer {
//! // ...
//! fn serialize_bytes(self, value: &[u8]) -> Result<(), ()>;
//! // ...
//! }
//!
//! // also in serde
//! pub trait Serialize {
//! fn serialize<S: Serializer>(&self, serializer: S) -> Result<(), ()>;
//! }
//! }
//!
//! // your custom wrapper
//! #[allow(dead_code)] // to muffle warnings compiling this example
//! struct V2HistogramWrapper {
//! histogram: Histogram<u64>
//! }
//!
//! impl serde::Serialize for V2HistogramWrapper {
//! fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<(), ()> {
//! // Not optimal to not re-use the vec and serializer, but it'll work
//! let mut vec = Vec::new();
//! // Pick the serialization format you want to use. Here, we use plain V2, but V2 +
//! // DEFLATE is also available.
//! // Map errors as appropriate for your use case.
//! V2Serializer::new().serialize(&self.histogram, &mut vec)
//! .map_err(|_| ())?;
//! serializer.serialize_bytes(&vec)?;
//! Ok(())
//! }
//! }
//! ```
//!
//! # Examples
//!
//! Creating, serializing, and deserializing a single histogram using a `Vec<u8>` as a `Write` and a
//! `&[u8]` slice from the vec as a `Read`.
//!
//! ```
//! use hdrhistogram::Histogram;
//! use hdrhistogram::serialization::{Deserializer, Serializer, V2Serializer};
//!
//! let mut vec = Vec::new();
//! let orig_histogram = Histogram::<u64>::new(1).unwrap();
//! V2Serializer::new().serialize(&orig_histogram, &mut vec).unwrap();
//!
//! let _histogram: Histogram<u64> = Deserializer::new()
//! .deserialize(&mut vec.as_slice()).unwrap();
//! ```
//!
//! This example shows serializing several histograms into a `Vec<u8>` and deserializing them again,
//! at which point they are summed into one histogram (for further hypothetical analysis).
//!
//! ```
//! use hdrhistogram::Histogram;
//! use hdrhistogram::serialization::{Deserializer, Serializer, V2Serializer};
//! use std::io::Cursor;
//!
//! // Naturally, do real error handling instead of unwrap() everywhere
//!
//! let num_histograms = 4;
//! let mut histograms = Vec::new();
//!
//! // Make some histograms
//! for _ in 0..num_histograms {
//! let mut h = Histogram::<u64>::new_with_bounds(1, u64::max_value(), 3).unwrap();
//! h.record_n(42, 7).unwrap();
//! histograms.push(h);
//! }
//!
//! let mut buf = Vec::new();
//! let mut serializer = V2Serializer::new();
//!
//! // Save them to the buffer
//! for h in histograms.iter() {
//! serializer.serialize(h, &mut buf).unwrap();
//! }
//!
//! // Read them back out again
//! let mut deserializer = Deserializer::new();
//! let mut cursor = Cursor::new(&buf);
//!
//! let mut accumulator =
//! Histogram::<u64>::new_with_bounds(1, u64::max_value(), 3).unwrap();
//!
//! for _ in 0..num_histograms {
//! let h: Histogram<u64> = deserializer.deserialize(&mut cursor).unwrap();
//!
//! // behold, they are restored as they were originally
//! assert_eq!(7, h.count_at(42));
//! assert_eq!(0, h.count_at(1000));
//!
//! accumulator.add(h).unwrap();
//! }
//!
//! // all the counts are there
//! assert_eq!(num_histograms * 7, accumulator.count_at(42));
//! ```
//!
use std::{fmt, io};
use super::{Counter, Histogram};
#[cfg(test)]
mod tests;
#[cfg(all(test, feature = "bench_private"))]
mod benchmarks;
mod v2_serializer;
pub use self::v2_serializer::{V2SerializeError, V2Serializer};
mod v2_deflate_serializer;
pub use self::v2_deflate_serializer::{V2DeflateSerializeError, V2DeflateSerializer};
mod deserializer;
pub use self::deserializer::{DeserializeError, Deserializer};
pub mod interval_log;
const V2_COOKIE_BASE: u32 = 0x1c84_9303;
const V2_COMPRESSED_COOKIE_BASE: u32 = 0x1c84_9304;
const V2_COOKIE: u32 = V2_COOKIE_BASE | 0x10;
const V2_COMPRESSED_COOKIE: u32 = V2_COMPRESSED_COOKIE_BASE | 0x10;
const V2_HEADER_SIZE: usize = 40;
/// Histogram serializer.
///
/// Different implementations serialize to different formats.
pub trait Serializer {
/// Error type returned when serialization fails.
type SerializeError: fmt::Debug;
/// Serialize the histogram into the provided writer.
/// Returns the number of bytes written, or an error.
///
/// Note that `Vec<u8>` is a reasonable `Write` implementation for simple usage.
fn serialize<T: Counter, W: io::Write>(
&mut self,
h: &Histogram<T>,
writer: &mut W,
) -> Result<usize, Self::SerializeError>;
}