hdrhistogram/serialization/interval_log/
mod.rs

Help
1//! Interval log parsing and writing.
2//!
3//! Interval logs, as handled by the Java implementation's `HistogramLogWriter`,
4//! `HistogramLogReader`, and `HistogramLogProcessor`, are a way to record a sequence of histograms
5//! over time. Suppose you were running a load test for an hour: you might want to record a
6//! histogram per second or minute so that you could correlate measurements with behavior you might
7//! see in logs, etc.
8//!
9//! An interval log contains some initial metadata, then a sequence of histograms, each with some
10//! additional metadata (timestamps, etc). See `IntervalLogHistogram`.
11//!
12//! The intervals in the log should be ordered by start timestamp. It's possible to write (and
13//! parse) logs with intervals in any order, but the expectation is that they will be sorted.
14//!
15//! To parse a log, see `IntervalLogIterator`. To write a log, see `IntervalLogWriterBuilder`.
16//!
17//! # Timestamps
18//!
19//! Each interval has a timestamp in seconds associated with it. However, it's not necessarily as
20//! simple as just interpreting the number as seconds since the epoch. There are two optional pieces
21//! of header metadata: "StartTime" and "BaseTime". Neither, one, or both of these may be present.
22//! It is possible to have multiple StartTime or BaseTime entries in the log, or even interleaved
23//! with interval histograms, but that is confusing, so this API prevents you from doing so.
24//!
25//! When BaseTime is present, per-interval timestamps are the number of seconds since BaseTime. When
26//! it is absent, the per-interval timestamps represent fractional seconds since the epoch (aka
27//! Unix time). BaseTime is useful because it allows you to have somewhat human-readable timestamps
28//! for each interval -- it's easier to see 245.3 and see that that's 4 minutes after the start of
29//! whatever it is you're doing than it is to mentally parse a Unix timestamp. Naturally, you can
30//! always calculate the deltas after the fact if you're willing to write a little tooling, but in
31//! some cases log files are consumed by humans too.
32//!
33//! While BaseTime is used to redefine per-interval timestamps for human readability, StartTime
34//! provides information about when the process that is generating the log reached some "start"
35//! condition. It's frequently the case that intervals will start some seconds after the process
36//! started, whether due to initial warmup before a benchmark or because it just takes a
37//! while to start up. If this is relevant to your workload, use StartTime to record the actual
38//! process start time (or other relevant "start" condition, like when a benchmark begins on an
39//! already long-running process). You could then use this when processing a log to more accurately
40//! plot interval data over time. Put differently, StartTime doesn't change the interpretation of
41//! interval timestamps directly like BaseTime; instead, it provides a hint to the consumer of the
42//! log that the "start" (whatever that means to you -- process start, etc) was at a different time
43//! than that associated with the first interval.
44//!
45//! #### Example scenario
46//!
47//! To explain their usage, suppose we're running a multi-hour benchmark on a process that starts
48//! up at a Unix time of 1500000000. We'll be recording separate log files per hour just to
49//! demonstrate the interaction between BaseTime, StartTime, and interval log timestamps.
50//!
51//! The process starts up, warms up its caches, JIT compiles, etc and is ready to start its
52//! benchmark 40 seconds later, so we create the first interval log file and record a StartTime of
53//! 1500000040. If the actual process start (rather than benchmark start) is more useful to you,
54//! using a StartTime of 1500000000 would be reasonable, but we'll stick with 1500000040.
55//!
56//! We'll use a BaseTime of 1500000040 because that's when the benchmark began, but 1500000000 would
57//! also be a reasonable choice here -- it would just make the per-interval deltas 40 seconds
58//! larger, which might be a more useful way of recording them, depending on the situation.
59//!
60//! The benchmark produces an interval histogram for each 60 seconds of workload, so the first one
61//! is ready at 1500000100, and is recorded with a delta timestamp of 60. This goes on for another
62//! hour, with the last one being 3540 seconds after the start of the benchmark with a corresponding
63//! delta of 3540.
64//!
65//! At the start of the second hour, the time is 1500003640 and the first log file is ended (which
66//! is straightforward: just stop writing to it and close the file) and the second log file is
67//! opened. It still uses a StartTime of 1500000040 because we want to represent that this log
68//! pertains to something that started an hour ago, but we'll use a BaseTime of 1500003640 so that
69//! our delta timestamps start over at 0. Again, you could just as well decide to use the same
70//! BaseTime of 1500000040 if you prefer, but it's arguably easier for a human to parse "h hours
71//! into the benchmark, s seconds past the hour" than it is to mentally divide by 3600, so we'll
72//! go with the new BaseTime.
73//!
74//! Suppose now you wanted to process these logs and show information (e.g. the 0.999th quantile of
75//! each interval) as the y-axis on a plot where time is the x-axis. You would want to have
76//! StartTime be the zero of the x-axis. For each interval, calculate the Unix time by adding its
77//! timestamp to BaseTime, where BaseTime is assumed to be zero if it's not specified. The point on
78//! the x-axis for that interval would be the result of subtracting StartTime from that Unix time.
79//! As an example, the 17th minute in the 2nd hour would have an interval timestamp of approximately
80//! 1020, which when added to 1500003640 is 1500004660. The difference between StartTime and then is
81//! 4620. You might choose to display that in units of minutes, which would be 4620 / 60 = 77.
82//!
83//! #### Java interop
84//!
85//! Since you may be interoperating with the Java reference implementation, here's a summary of the
86//! logic used by the `HistogramLogReader` class for StartTime and BaseTime. It's slightly different
87//! than what was described above, presumably for legacy compatibility reasons. This class stores
88//! the StartTime as a field which is exposed via a getter, and also integrates filtering intervals
89//! based on windows for either "absolute" (Unix time) timestamps or "offset" timestamps (delta vs
90//! the StartTime), so if you're using that filtering functionality, you need to understand how it
91//! ends up setting its internal version of StartTime.
92//!
93//! - Neither StartTime nor BaseTime are present: interval timestamps are interpreted as seconds
94//! since the epoch. The first interval's timestamp is stored to the StartTime field.
95//! - StartTime is present: StartTime is a number of seconds since epoch, and interval timestamps
96//! may be interpreted as deltas to be added to StartTime or as "absolute" Unix time depending on a
97//! heuristic. In other words, the heuristic chooses between setting the effective BaseTime to 0 or
98//! to StartTime. Specifically, the heuristic interprets interval timestamps as deltas if they are
99//! more than a year's worth of seconds smaller than StartTime and as absolute timestamps otherwise.
100//! - BaseTime is present: BaseTime is a number of seconds since epoch, and interval timestamps are
101//! interpreted as deltas. The first interval's (delta) timestamp is stored to the StartTime field.
102//! This is likely a bug, since StartTime should be an absolute timestamp, and appears to cause
103//! erroneous behavior when filtering by offset timestamps.
104//! - BaseTime and StartTime are present: The BaseTime is used like it is when it's the only one
105//! present: it's a number of seconds since epoch that serves as the starting point for the
106//! per-interval deltas to get a wall-clock time for each interval. No heuristics are applied to
107//! guess whether or not the intervals are absolute or deltas.
108//!
109//! The Java implementation also supports re-setting the StartTime and BaseTime if those entries
110//! exist more than once in the log. Suppose that you had an hour's worth of per-minute intervals,
111//! and then you set another StartTime to the current time and added a second hour's worth of
112//! intervals. Filtering for "all intervals between 5 and 10 minutes past the start" would result in
113//! two disjoint sequences of intervals, which is at the very least arguably unintuitive. However,
114//! you may well be working with log files that are structured that way, so this implementation's
115//! `IntervalLogIterator` will expose multiple StartTime, etc, entries as they appear in the log
116//! file.
117//!
118//! # Examples
119//!
120//! Parse a single interval from a log.
121//!
122//! ```
123//! use std::time;
124//! use hdrhistogram::serialization::interval_log;
125//!
126//! // two newline-separated log lines: a comment, then an interval
127//! let log = b"#I'm a comment\nTag=t,0.127,1.007,2.769,base64EncodedHisto\n";
128//!
129//! let mut iter = interval_log::IntervalLogIterator::new(&log[..]);
130//!
131//! // the comment is consumed and ignored by the parser, so the first event is an Interval
132//! match iter.next().unwrap() {
133//!     Ok(interval_log::LogEntry::Interval(h)) => {
134//!         assert_eq!(time::Duration::new(0, 127_000_000), h.start_timestamp());
135//!     }
136//!     _ => panic!()
137//! }
138//!
139//! // there are no more lines in the log; iteration complete
140//! assert_eq!(None, iter.next());
141//! ```
142//!
143//! Skip logs that started before 3 seconds.
144//!
145//! ```
146//! use hdrhistogram::serialization::interval_log;
147//!
148//! let log = "\
149//!     #I'm a comment\n\
150//!     Tag=a,0.123,1.007,2.769,base64EncodedHisto\n\
151//!     1.456,1.007,2.769,base64EncodedHisto\n\
152//!     3.789,1.007,2.769,base64EncodedHisto\n\
153//!     Tag=b,4.123,1.007,2.769,base64EncodedHisto\n\
154//!     5.456,1.007,2.769,base64EncodedHisto\n\
155//!     #Another comment\n"
156//! .as_bytes();
157//!
158//! let iter = interval_log::IntervalLogIterator::new(&log);
159//!
160//! let count = iter
161//!     // only look at intervals (which are the only non-comment lines in this log)
162//!     .filter_map(|e| match e {
163//!         Ok(interval_log::LogEntry::Interval(ilh)) => Some(ilh),
164//!          _ => None
165//!     })
166//!     // do any filtering you want
167//!     .filter(|ilh| ilh.start_timestamp().as_secs() >= 3)
168//!     .count();
169//!
170//! assert_eq!(3, count);
171//! ```
172//!
173//! Write a log.
174//!
175//! ```
176//! use std::{str, time};
177//! use hdrhistogram;
178//! use hdrhistogram::serialization;
179//! use hdrhistogram::serialization::interval_log;
180//!
181//! let mut buf = Vec::new();
182//! let mut serializer = serialization::V2Serializer::new();
183//!
184//! let mut h = hdrhistogram::Histogram::<u64>::new_with_bounds(
185//!     1, u64::max_value(), 3).unwrap();
186//! h.record(12345).unwrap();
187//!
188//! // limit scope of mutable borrow of `buf`
189//! {
190//!     let now = time::SystemTime::now();
191//!     let mut log_writer = interval_log::IntervalLogWriterBuilder::new()
192//!         .add_comment("Comments are great")
193//!         .with_start_time(now)
194//!         .begin_log_with(&mut buf, &mut serializer)
195//!         .unwrap();
196//!
197//!     log_writer.write_comment(
198//!         "You can have comments anywhere in the log").unwrap();
199//!
200//!     log_writer
201//!         .write_histogram(
202//!             &h,
203//!             now.elapsed().unwrap(),
204//!             time::Duration::new(12, 345_678_901),
205//!             interval_log::Tag::new("im-a-tag")
206//!         )
207//!         .unwrap();
208//! }
209//!
210//! // `buf` is now full of stuff; we check for the first line
211//! assert_eq!("#Comments are great\n", &str::from_utf8(&buf).unwrap()[0..20]);
212//! ```
213
214use std::cmp::Ordering;
215use std::error::Error;
216use std::fmt::Write;
217use std::str::FromStr;
218use std::{fmt, io, ops, str, time};
219
220use nom::branch::alt;
221use nom::bytes::complete::{tag, take, take_until, take_while1};
222use nom::character::complete::char;
223use nom::character::is_digit;
224use nom::combinator::{complete, map_res, opt, recognize};
225use nom::error::ErrorKind;
226use nom::number::complete::double;
227use nom::{Err, IResult};
228
229use super::super::{Counter, Histogram};
230use super::Serializer;
231
232/// Prepare an `IntervalLogWriter`.
233///
234/// This type only allows writing comments and headers. Once you're done writing those things, use
235/// `into_log_writer()` to convert this into an `IntervalLogWriter`.
236pub struct IntervalLogWriterBuilder {
237    comments: Vec<String>,
238    start_time: Option<f64>,
239    base_time: Option<f64>,
240    max_value_divisor: f64,
241}
242
243impl Default for IntervalLogWriterBuilder {
244    fn default() -> Self {
245        Self::new()
246    }
247}
248
249impl IntervalLogWriterBuilder {
250    /// Create a new log writer that writes to `writer` and serializes histograms with `serializer`.
251    pub fn new() -> IntervalLogWriterBuilder {
252        IntervalLogWriterBuilder {
253            comments: Vec::new(),
254            start_time: None,
255            base_time: None,
256            max_value_divisor: 1.0,
257        }
258    }
259
260    /// Add a comment line to be written when the writer is built.
261    ///
262    /// Comments containing '\n' will be transformed into multiple lines of comments.
263    pub fn add_comment(&mut self, s: &str) -> &mut Self {
264        self.comments.push(s.to_owned());
265        self
266    }
267
268    /// Set a StartTime. See the module-level documentation for more info.
269    ///
270    /// This can be called multiple times, but only the value for the most recent invocation will
271    /// be written.
272    pub fn with_start_time(&mut self, time: time::SystemTime) -> &mut Self {
273        self.start_time = Some(system_time_as_fp_seconds(time));
274        self
275    }
276
277    /// Set a BaseTime. See the module-level documentation for more info.
278    ///
279    /// This can be called multiple times, but only the value for the most recent invocation will
280    /// be written.
281    pub fn with_base_time(&mut self, time: time::SystemTime) -> &mut Self {
282        self.base_time = Some(system_time_as_fp_seconds(time));
283        self
284    }
285
286    /// Set a max value divisor.
287    ///
288    /// This is used to scale down the max value part of an interval log to something that may be
289    /// more human readable. The max value in the log is only for human consumption, so you might
290    /// prefer to divide by 10<sup>9</sup> to turn nanoseconds into fractional seconds, for
291    /// instance.
292    ///
293    /// If this is not set, 1.0 will be used.
294    ///
295    /// This can be called multiple times, but only the value for the most recent invocation will
296    /// be written.
297    pub fn with_max_value_divisor(&mut self, max_value_divisor: f64) -> &mut Self {
298        self.max_value_divisor = max_value_divisor;
299        self
300    }
301
302    /// Build a LogWriter and apply any configured headers.
303    #[allow(clippy::float_cmp)]
304    pub fn begin_log_with<'a, 'b, W: 'a + io::Write, S: 'b + Serializer>(
305        &self,
306        writer: &'a mut W,
307        serializer: &'b mut S,
308    ) -> Result<IntervalLogWriter<'a, 'b, W, S>, io::Error> {
309        let mut internal_writer = InternalLogWriter {
310            writer,
311            serializer,
312            text_buf: String::new(),
313            serialize_buf: Vec::new(),
314            max_value_divisor: self.max_value_divisor,
315        };
316
317        for c in &self.comments {
318            internal_writer.write_comment(&c)?;
319        }
320
321        if let Some(st) = self.start_time {
322            internal_writer.write_fmt(format_args!(
323                "#[StartTime: {:.3} (seconds since epoch)]\n",
324                st
325            ))?;
326        }
327
328        if let Some(bt) = self.base_time {
329            internal_writer.write_fmt(format_args!(
330                "#[BaseTime: {:.3} (seconds since epoch)]\n",
331                bt
332            ))?;
333        }
334
335        // The Java impl doesn't write a comment for this but it's confusing to silently modify the
336        // max value without leaving a trace
337        if self.max_value_divisor != 1.0_f64 {
338            internal_writer.write_fmt(format_args!(
339                "#[MaxValueDivisor: {:.3}]\n",
340                self.max_value_divisor
341            ))?;
342        }
343
344        Ok(IntervalLogWriter { internal_writer })
345    }
346}
347
348/// Writes interval histograms in an interval log.
349///
350/// This isn't created directly; start with an `IntervalLogWriterBuilder`. Once you've written the
351/// headers and ended up with an `IntervalLogWriter`, typical usage would be to write a histogram
352/// at regular intervals (e.g. once a second).
353///
354/// ```
355/// use hdrhistogram::serialization;
356/// use hdrhistogram::serialization::interval_log;
357///
358/// let mut buf = Vec::new();
359/// let mut serializer = serialization::V2Serializer::new();
360///
361/// // create a writer via a builder
362/// let mut writer = interval_log::IntervalLogWriterBuilder::new()
363///     .begin_log_with(&mut buf, &mut serializer)
364///     .unwrap();
365///
366/// writer.write_comment("Comment 2").unwrap();
367///
368/// // .. write some intervals
369/// ```
370pub struct IntervalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
371    internal_writer: InternalLogWriter<'a, 'b, W, S>,
372}
373
374impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> IntervalLogWriter<'a, 'b, W, S> {
375    /// Write a comment line.
376    ///
377    /// Comments containing '\n' will be transformed into multiple lines of comments.
378    pub fn write_comment(&mut self, s: &str) -> io::Result<()> {
379        self.internal_writer.write_comment(s)
380    }
381
382    /// Write an interval histogram.
383    ///
384    /// `start_timestamp` is the time since the epoch in seconds that measurements started being
385    /// recorded in this interval. If you're using a StartTime or BaseTime offset, you should
386    /// instead use a delta since that time. See the discussion about timestamps in the module-level
387    /// documentation.
388    ///
389    /// `duration` is the duration of the interval in seconds.
390    ///
391    /// `tag` is an optional tag for this histogram.
392    pub fn write_histogram<T: Counter>(
393        &mut self,
394        h: &Histogram<T>,
395        start_timestamp: time::Duration,
396        duration: time::Duration,
397        tag: Option<Tag>,
398    ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
399        self.internal_writer
400            .write_histogram(h, start_timestamp, duration, tag)
401    }
402}
403
404/// Errors that can occur while writing a log.
405#[derive(Debug)]
406pub enum IntervalLogWriterError<E> {
407    /// Histogram serialization failed.
408    SerializeError(E),
409    /// An i/o error occurred.
410    IoError(io::Error),
411}
412
413impl<E> From<io::Error> for IntervalLogWriterError<E> {
414    fn from(e: io::Error) -> Self {
415        IntervalLogWriterError::IoError(e)
416    }
417}
418
419impl<E: fmt::Display + fmt::Debug> fmt::Display for IntervalLogWriterError<E> {
420    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
421        match self {
422            IntervalLogWriterError::SerializeError(e) => {
423                write!(f, "Histogram serialization failed: {}", e)
424            }
425            IntervalLogWriterError::IoError(e) => write!(f, "An i/o error occurred: {}", e),
426        }
427    }
428}
429
430impl<E: Error + 'static> Error for IntervalLogWriterError<E> {
431    fn source(&self) -> Option<&(dyn Error + 'static)> {
432        match self {
433            IntervalLogWriterError::SerializeError(e) => Some(e),
434            IntervalLogWriterError::IoError(e) => Some(e),
435        }
436    }
437}
438
439/// Write interval logs.
440struct InternalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
441    writer: &'a mut W,
442    serializer: &'b mut S,
443    text_buf: String,
444    serialize_buf: Vec<u8>,
445    max_value_divisor: f64,
446}
447
448impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> InternalLogWriter<'a, 'b, W, S> {
449    fn write_fmt(&mut self, args: fmt::Arguments) -> io::Result<()> {
450        self.writer.write_fmt(args)
451    }
452
453    fn write_comment(&mut self, s: &str) -> io::Result<()> {
454        for l in s.split('\n') {
455            writeln!(self.writer, "#{}", l)?;
456        }
457
458        Ok(())
459    }
460
461    fn write_histogram<T: Counter>(
462        &mut self,
463        h: &Histogram<T>,
464        start_timestamp: time::Duration,
465        duration: time::Duration,
466        tag: Option<Tag>,
467    ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
468        self.serialize_buf.clear();
469        self.text_buf.clear();
470
471        if let Some(Tag(s)) = tag {
472            write!(self.text_buf, "Tag={},", &s).expect("Writes to a String can't fail");
473        }
474
475        write!(
476            self.writer,
477            "{}{:.3},{:.3},{:.3},",
478            self.text_buf,
479            duration_as_fp_seconds(start_timestamp),
480            duration_as_fp_seconds(duration),
481            h.max() as f64 / self.max_value_divisor // because the Java impl does it this way
482        )?;
483
484        self.text_buf.clear();
485        let _len = self
486            .serializer
487            .serialize(h, &mut self.serialize_buf)
488            .map_err(IntervalLogWriterError::SerializeError)?;
489        base64::encode_config_buf(&self.serialize_buf, base64::STANDARD, &mut self.text_buf);
490
491        self.writer.write_all(self.text_buf.as_bytes())?;
492        self.writer.write_all(b"\n")?;
493
494        Ok(())
495    }
496}
497
498/// A tag for an interval histogram.
499///
500/// Tags are just `str`s that do not contain a few disallowed characters: ',', '\r', '\n', and ' '.
501///
502/// To get the wrapped `str` back out, use `as_str()` or the `Deref<str>` implementation
503/// (`&some_tag`).
504#[derive(Debug, PartialEq, Clone, Copy)]
505pub struct Tag<'a>(&'a str);
506
507impl<'a> Tag<'a> {
508    /// Create a new Tag.
509    ///
510    /// If a disallowed character is present, this will return `None`.
511    pub fn new(s: &'a str) -> Option<Tag<'a>> {
512        if s.chars()
513            .any(|c| c == ',' || c == '\r' || c == '\n' || c == ' ')
514        {
515            None
516        } else {
517            Some(Tag(s))
518        }
519    }
520
521    /// Returns the tag contents as a str.
522    pub fn as_str(&self) -> &'a str {
523        self.0
524    }
525}
526
527impl<'a> ops::Deref for Tag<'a> {
528    type Target = str;
529
530    fn deref(&self) -> &Self::Target {
531        self.as_str()
532    }
533}
534
535/// An individual interval histogram parsed from an interval log.
536#[derive(PartialEq, Debug)]
537pub struct IntervalLogHistogram<'a> {
538    tag: Option<Tag<'a>>,
539    start_timestamp: time::Duration,
540    duration: time::Duration,
541    max: f64,
542    encoded_histogram: &'a str,
543}
544
545impl<'a> IntervalLogHistogram<'a> {
546    /// Tag, if any is present.
547    pub fn tag(&self) -> Option<Tag<'a>> {
548        self.tag
549    }
550
551    /// Timestamp of the start of the interval in seconds, expressed as a `Duration` relative to
552    /// some start point.
553    ///
554    /// The timestamp may be absolute vs the epoch, or there may be a `StartTime` or `BaseTime` for
555    /// the log, in which case you may wish to consider this number as a delta vs those timestamps.
556    /// See the module-level documentation about timestamps.
557    pub fn start_timestamp(&self) -> time::Duration {
558        self.start_timestamp
559    }
560
561    /// Duration of the interval in seconds.
562    pub fn duration(&self) -> time::Duration {
563        self.duration
564    }
565
566    /// Max value in the encoded histogram
567    ///
568    /// This max value is the max of the histogram divided by some scaling factor (which may be
569    /// 1.0).
570    pub fn max(&self) -> f64 {
571        self.max
572    }
573
574    /// Base64-encoded serialized histogram.
575    ///
576    /// If you need the deserialized histogram, base64-decode and use a `Deserializer` on the
577    /// resulting bytes.
578    pub fn encoded_histogram(&self) -> &'a str {
579        self.encoded_histogram
580    }
581}
582
583#[derive(PartialEq, Debug)]
584/// Represents one non-comment line in an interval log.
585///
586/// One thing to note is that the way your interval timestamps work can vary. If your log was
587/// written with a StartTime or BaseTime, that metadata will appear in header comments, and that
588/// will be represented by the iterator providing the corresponding variants here. The presence
589/// of those timestamps will affect how you should interpret the timestamps for individual
590/// intervals. See the module-level documentation.
591#[allow(variant_size_differences)]
592pub enum LogEntry<'a> {
593    /// Logs may include a StartTime. If present, it represents seconds since the epoch.
594    StartTime(time::Duration),
595    /// Logs may include a BaseTime. If present, it represents seconds since the epoch.
596    BaseTime(time::Duration),
597    /// An individual interval histogram.
598    Interval(IntervalLogHistogram<'a>),
599}
600
601/// Errors that occur when parsing an interval log.
602#[derive(Debug, PartialEq)]
603pub enum LogIteratorError {
604    /// Parsing failed
605    ParseError {
606        /// Offset in the input where the failed parse started
607        offset: usize,
608    },
609}
610
611/// Parse interval logs.
612///
613/// This iterator exposes each item (excluding comments and other information-free lines). See
614/// `LogEntry`.
615///
616/// Because histogram deserialization is deferred, parsing logs is fast. See the `interval_log`
617/// benchmark if you wish to see how it does on your hardware. As a baseline, parsing a log of 1000
618/// random histograms of 10,000 values each takes 2ms total on an E5-1650v3.
619///
620/// Deferring deserialization is handy because it allows you to cheaply navigate the log to find
621/// the records you care about (e.g. ones in a certain time range, or with a certain tag) without
622/// doing all the allocation, etc, of deserialization.
623///
624/// If you're looking for a direct port of the Java impl's `HistogramLogReader`, this isn't one: it
625/// won't deserialize for you, and it pushes the burden of figuring out what to do with StartTime,
626/// BaseTime, etc to you, and there aren't built in functions to filter by timestamp. On the other
627/// hand, because it doesn't do those things, it is much more flexible: you can easily build any
628/// sort of filtering you want, not just timestamp ranges, because you have cheap access to all the
629/// metadata before incurring the cost of deserialization. If you're not using any timestamp
630/// headers, or at least using them in straightforward ways, it is easy to accumulate the
631/// timestamp state you need. Since all the parsing is taken care of already, writing your own
632/// `HistogramLogReader` equivalent that fits the way your logs are assembled is just a couple of
633/// lines. (And if you're doing complex stuff, we probably wouldn't have built something that fits
634/// your quirky logs anyway!)
635///
636/// This parses from a slice representing the complete file because it made implementation easier
637/// (and also supports mmap'd files for maximum parsing speed). If parsing from a `Read` is
638/// important for your use case, open an issue about it.
639pub struct IntervalLogIterator<'a> {
640    orig_len: usize,
641    input: &'a [u8],
642    ended: bool,
643}
644
645impl<'a> IntervalLogIterator<'a> {
646    /// Create a new iterator from the UTF-8 bytes of an interval log.
647    pub fn new(input: &'a [u8]) -> IntervalLogIterator<'a> {
648        IntervalLogIterator {
649            orig_len: input.len(),
650            input,
651            ended: false,
652        }
653    }
654}
655
656impl<'a> Iterator for IntervalLogIterator<'a> {
657    type Item = Result<LogEntry<'a>, LogIteratorError>;
658
659    fn next(&mut self) -> Option<Self::Item> {
660        loop {
661            if self.ended {
662                return None;
663            }
664
665            if self.input.is_empty() {
666                self.ended = true;
667                return None;
668            }
669
670            // Look for magic comments first otherwise they will get matched by the simple comment
671            // parser
672            if let Ok((rest, e)) = log_entry(self.input) {
673                self.input = rest;
674                return Some(Ok(e));
675            }
676
677            // it wasn't a log entry; try parsing a comment
678            match ignored_line(self.input) {
679                Ok((rest, _)) => {
680                    self.input = rest;
681                    continue;
682                }
683                _ => {
684                    self.ended = true;
685                    return Some(Err(LogIteratorError::ParseError {
686                        offset: self.orig_len - self.input.len(),
687                    }));
688                }
689            }
690        }
691    }
692}
693
694fn duration_as_fp_seconds(d: time::Duration) -> f64 {
695    d.as_secs() as f64 + f64::from(d.subsec_nanos()) / 1_000_000_000_f64
696}
697
698fn system_time_as_fp_seconds(time: time::SystemTime) -> f64 {
699    match time.duration_since(time::UNIX_EPOCH) {
700        Ok(dur_after_epoch) => duration_as_fp_seconds(dur_after_epoch),
701        // Doesn't seem possible to be before the epoch, but using a negative number seems like
702        // a reasonable representation if it does occur
703        Err(t) => duration_as_fp_seconds(t.duration()) * -1_f64,
704    }
705}
706
707fn start_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
708    let (input, _) = tag("#[StartTime: ")(input)?;
709    let (input, duration) = fract_sec_duration(input)?;
710    let (input, _) = char(' ')(input)?;
711    let (input, _) = take_until("\n")(input)?;
712    let (input, _) = take(1_usize)(input)?;
713    Ok((input, LogEntry::StartTime(duration)))
714}
715
716fn base_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
717    let (input, _) = tag("#[BaseTime: ")(input)?;
718    let (input, duration) = fract_sec_duration(input)?;
719    let (input, _) = char(' ')(input)?;
720    let (input, _) = take_until("\n")(input)?;
721    let (input, _) = take(1_usize)(input)?;
722    Ok((input, LogEntry::BaseTime(duration)))
723}
724
725fn tag_bytes(input: &[u8]) -> IResult<&[u8], &[u8]> {
726    let (input, _) = tag("Tag=")(input)?;
727    let (input, tag) = take_until(",")(input)?;
728    let (input, _) = take(1_usize)(input)?;
729    Ok((input, tag))
730}
731
732fn tag_parser(input: &[u8]) -> IResult<&[u8], Tag> {
733    let (input, tag) = map_res(tag_bytes, str::from_utf8)(input)?;
734    Ok((input, Tag(tag)))
735}
736
737fn interval_hist(input: &[u8]) -> IResult<&[u8], LogEntry> {
738    let (input, tag) = opt(tag_parser)(input)?;
739    let (input, start_timestamp) = fract_sec_duration(input)?;
740    let (input, _) = char(',')(input)?;
741    let (input, duration) = fract_sec_duration(input)?;
742    let (input, _) = char(',')(input)?;
743    let (input, max) = double(input)?;
744    let (input, _) = char(',')(input)?;
745    let (input, encoded_histogram) = map_res(take_until("\n"), str::from_utf8)(input)?;
746    let (input, _) = take(1_usize)(input)?;
747
748    Ok((
749        input,
750        LogEntry::Interval(IntervalLogHistogram {
751            tag,
752            start_timestamp,
753            duration,
754            max,
755            encoded_histogram,
756        }),
757    ))
758}
759
760fn log_entry(input: &[u8]) -> IResult<&[u8], LogEntry<'_>> {
761    complete(alt((start_time, base_time, interval_hist)))(input)
762}
763
764fn comment_line(input: &[u8]) -> IResult<&[u8], ()> {
765    let (input, _) = tag("#")(input)?;
766    let (input, _) = take_until("\n")(input)?;
767    let (input, _) = take(1_usize)(input)?;
768    Ok((input, ()))
769}
770
771fn legend(input: &[u8]) -> IResult<&[u8], ()> {
772    let (input, _) = tag("\"StartTimestamp\"")(input)?;
773    let (input, _) = take_until("\n")(input)?;
774    let (input, _) = take(1_usize)(input)?;
775    Ok((input, ()))
776}
777
778fn ignored_line(input: &[u8]) -> IResult<&[u8], ()> {
779    alt((comment_line, legend))(input)
780}
781
782fn fract_sec_duration(input: &[u8]) -> IResult<&[u8], time::Duration> {
783    let (rest, data) = fract_sec_tuple(input)?;
784    let (secs, nanos_str) = data;
785
786    // only read up to 9 digits since we can only support nanos, not smaller precision
787    let nanos_parse_res = match nanos_str.len().cmp(&9) {
788        Ordering::Greater => nanos_str[0..9].parse::<u32>(),
789        Ordering::Equal => nanos_str.parse::<u32>(),
790        Ordering::Less => nanos_str
791            .parse::<u32>()
792            // subtraction will not overflow because len is < 9
793            .map(|n| n * 10_u32.pow(9 - nanos_str.len() as u32)),
794    };
795
796    if let Ok(nanos) = nanos_parse_res {
797        return Ok((rest, time::Duration::new(secs, nanos)));
798    }
799
800    // nanos were invalid utf8. We don't expose these errors, so don't bother defining a
801    // custom error type.
802    Err(Err::Error(error_position!(input, ErrorKind::Alpha)))
803}
804
805type FResult<'a> = IResult<&'a [u8], (u64, &'a str)>;
806
807fn fract_sec_tuple(input: &[u8]) -> FResult {
808    let (input, secs) = map_res(
809        map_res(recognize(take_until(".")), str::from_utf8),
810        u64::from_str,
811    )(input)?;
812    let (input, _) = tag(".")(input)?;
813    let (input, nanos_str) = map_res(complete(take_while1(is_digit)), str::from_utf8)(input)?;
814    Ok((input, (secs, nanos_str)))
815}
816
817#[cfg(test)]
818mod tests;
hdrhistogram/serialization/interval_log/mod.rs

hdrhistogram/serialization/interval_log/
mod.rs