hdrhistogram/serialization/
mod.rs

1//! Serialization/deserialization support.
2//!
3//! The upstream Java project has established several different types of serialization. We have
4//! currently implemented V2 and V2 + DEFLATE (following the names used by the Java implementation).
5//!
6//! These formats are compact binary representations of the state of the histogram. They are
7//! intended to be used for archival or transmission to other systems for further analysis. A
8//! typical use case would be to periodically serialize a histogram, save it somewhere, and reset
9//! the histogram.
10//!
11//! Histograms are designed to be added, subtracted, and otherwise manipulated, and an efficient
12//! storage format facilitates this. As an example, you might be capturing histograms once a minute
13//! to have a granular view into your performance over time, but you might also want to see longer
14//! trends over an hour or day. Simply deserialize the last 60 minutes worth to recreate their
15//! in-memory `Histogram` form, add them all together into one `Histogram`, and perform whatever
16//! calculations you wish on the resulting histogram. This would allow you to correctly calculate
17//! the 99.99th percentile for the entire hour, for instance, which is not something you can do
18//! if you have only stored percentiles (as opposed to the entire histogram) for each minute.
19//!
20//! # Performance concerns
21//!
22//! Serialization is quite fast; serializing a histogram in V2 format that represents 1 to
23//! `u64::max_value()` with 3 digits of precision with tens of thousands of recorded counts takes
24//! about 40 microseconds on an E5-1650v3 Xeon. Deserialization is about 3x slower, but that will
25//! improve as there are still some optimizations to perform.
26//!
27//! For the V2 format, the space used for a histogram will depend mainly on precision since higher
28//! precision will reduce the extent to which different values are grouped into the same bucket.
29//! Having a large value range (e.g. 1 to `u64::max_value()`) will not directly impact the size if
30//! there are many zero counts as zeros are compressed away.
31//!
32//! V2 + DEFLATE is significantly slower to serialize (around 10x) but only a little bit slower to
33//! deserialize (less than 2x). YMMV depending on the compressibility of your histogram data, the
34//! speed of the underlying storage medium, etc. Naturally, you can always compress at a later time:
35//! there's no reason why you couldn't serialize as V2 and then later re-serialize it as V2 +
36//! DEFLATE on another system (perhaps as a batch job) for better archival storage density.
37//!
38//! # API
39//!
40//! Each serialization format has its own serializer struct, but since each format is reliably
41//! distinguishable from each other, there is only one `Deserializer` struct that will work for
42//! any of the formats this library implements.
43//!
44//! Serializers and deserializers are intended to be re-used for many histograms. You can use them
45//! for one histogram and throw them away; it will just be less efficient as the cost of their
46//! internal buffers will not be amortized across many histograms.
47//!
48//! Serializers can write to any `Write` implementation, and `Deserializer` can read from any
49//! `Read`. This should make it easy to use them in almost any context, as everything from i/o
50//! streams to `Vec<u8>` can be a `Read` or `Write`.
51//!
52//! # Interval logs
53//!
54//! See the `interval_log` module.
55//!
56//! ### Integration with general-purpose serialization libraries
57//!
58//! In general, serializing histograms should be straightforward: pick the serialization format
59//! that is suitable for your requirements (e.g. based on what formats are supported by other tools
60//! that will consume the serialized histograms) and use the corresponding struct.
61//!
62//! However, there are some approaches to serialization like [serde's
63//! `Serialize`](https://docs.serde.rs/serde/trait.Serialize.html) or [`rustc_serialize`'s
64//! `Encodable`](https://doc.rust-lang.org/rustc-serialize/rustc_serialize/trait.Encodable.html)
65//! that effectively require that only one way of serialization can be used because a trait can
66//! only be implemented once for a struct. This is too restrictive for histograms since they
67//! inherently have multiple ways of being serialized, so as a library we cannot pick the format
68//! for you. If you need to interoperate with such a restriction, a good approach is to first pick
69//! your serialization format (V2, etc) like you normally would, then make a wrapper struct. The
70//! wrapper effectively gives you a struct whose sole opportunity to implement a trait you can
71//! expend to satisfy the way serde, etc, are structured.
72//!
73//! Here's a sketch of how that would look for serde's `Serialize`:
74//!
75//! ```
76//! use hdrhistogram::Histogram;
77//! use hdrhistogram::serialization::{Serializer, V2Serializer};
78//!
79//! mod serde {
80//!     // part of serde, simplified
81//!     pub trait Serializer {
82//!        // ...
83//!        fn serialize_bytes(self, value: &[u8]) -> Result<(), ()>;
84//!        // ...
85//!     }
86//!
87//!     // also in serde
88//!     pub trait Serialize {
89//!         fn serialize<S: Serializer>(&self, serializer: S) -> Result<(), ()>;
90//!     }
91//! }
92//!
93//! // your custom wrapper
94//! #[allow(dead_code)] // to muffle warnings compiling this example
95//! struct V2HistogramWrapper {
96//!     histogram: Histogram<u64>
97//! }
98//!
99//! impl serde::Serialize for V2HistogramWrapper {
100//!     fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<(), ()> {
101//!         // Not optimal to not re-use the vec and serializer, but it'll work
102//!         let mut vec = Vec::new();
103//!         // Pick the serialization format you want to use. Here, we use plain V2, but V2 +
104//!         // DEFLATE is also available.
105//!         // Map errors as appropriate for your use case.
106//!         V2Serializer::new().serialize(&self.histogram, &mut vec)
107//!             .map_err(|_| ())?;
108//!         serializer.serialize_bytes(&vec)?;
109//!         Ok(())
110//!     }
111//! }
112//! ```
113//!
114//! # Examples
115//!
116//! Creating, serializing, and deserializing a single histogram using a `Vec<u8>` as a `Write` and a
117//! `&[u8]` slice from the vec as a `Read`.
118//!
119//! ```
120//! use hdrhistogram::Histogram;
121//! use hdrhistogram::serialization::{Deserializer, Serializer, V2Serializer};
122//!
123//! let mut vec = Vec::new();
124//! let orig_histogram = Histogram::<u64>::new(1).unwrap();
125//! V2Serializer::new().serialize(&orig_histogram, &mut vec).unwrap();
126//!
127//! let _histogram: Histogram<u64> = Deserializer::new()
128//!     .deserialize(&mut vec.as_slice()).unwrap();
129//! ```
130//!
131//! This example shows serializing several histograms into a `Vec<u8>` and deserializing them again,
132//! at which point they are summed into one histogram (for further hypothetical analysis).
133//!
134//! ```
135//! use hdrhistogram::Histogram;
136//! use hdrhistogram::serialization::{Deserializer, Serializer, V2Serializer};
137//! use std::io::Cursor;
138//!
139//! // Naturally, do real error handling instead of unwrap() everywhere
140//!
141//! let num_histograms = 4;
142//! let mut histograms = Vec::new();
143//!
144//! // Make some histograms
145//! for _ in 0..num_histograms {
146//!     let mut h = Histogram::<u64>::new_with_bounds(1, u64::max_value(), 3).unwrap();
147//!     h.record_n(42, 7).unwrap();
148//!     histograms.push(h);
149//! }
150//!
151//! let mut buf = Vec::new();
152//! let mut serializer = V2Serializer::new();
153//!
154//! // Save them to the buffer
155//! for h in histograms.iter() {
156//!     serializer.serialize(h, &mut buf).unwrap();
157//! }
158//!
159//! // Read them back out again
160//! let mut deserializer = Deserializer::new();
161//! let mut cursor = Cursor::new(&buf);
162//!
163//! let mut accumulator =
164//!     Histogram::<u64>::new_with_bounds(1, u64::max_value(), 3).unwrap();
165//!
166//! for _ in 0..num_histograms {
167//!     let h: Histogram<u64> = deserializer.deserialize(&mut cursor).unwrap();
168//!
169//!     // behold, they are restored as they were originally
170//!     assert_eq!(7, h.count_at(42));
171//!     assert_eq!(0, h.count_at(1000));
172//!
173//!     accumulator.add(h).unwrap();
174//! }
175//!
176//! // all the counts are there
177//! assert_eq!(num_histograms * 7, accumulator.count_at(42));
178//! ```
179//!
180
181use std::{fmt, io};
182
183use super::{Counter, Histogram};
184
185#[cfg(test)]
186mod tests;
187
188#[cfg(all(test, feature = "bench_private"))]
189mod benchmarks;
190
191mod v2_serializer;
192pub use self::v2_serializer::{V2SerializeError, V2Serializer};
193
194mod v2_deflate_serializer;
195pub use self::v2_deflate_serializer::{V2DeflateSerializeError, V2DeflateSerializer};
196
197mod deserializer;
198pub use self::deserializer::{DeserializeError, Deserializer};
199
200pub mod interval_log;
201
202const V2_COOKIE_BASE: u32 = 0x1c84_9303;
203const V2_COMPRESSED_COOKIE_BASE: u32 = 0x1c84_9304;
204
205const V2_COOKIE: u32 = V2_COOKIE_BASE | 0x10;
206const V2_COMPRESSED_COOKIE: u32 = V2_COMPRESSED_COOKIE_BASE | 0x10;
207
208const V2_HEADER_SIZE: usize = 40;
209
210/// Histogram serializer.
211///
212/// Different implementations serialize to different formats.
213pub trait Serializer {
214    /// Error type returned when serialization fails.
215    type SerializeError: fmt::Debug;
216
217    /// Serialize the histogram into the provided writer.
218    /// Returns the number of bytes written, or an error.
219    ///
220    /// Note that `Vec<u8>` is a reasonable `Write` implementation for simple usage.
221    fn serialize<T: Counter, W: io::Write>(
222        &mut self,
223        h: &Histogram<T>,
224        writer: &mut W,
225    ) -> Result<usize, Self::SerializeError>;
226}