hdrhistogram/serialization/interval_log/mod.rs
1//! Interval log parsing and writing.
2//!
3//! Interval logs, as handled by the Java implementation's `HistogramLogWriter`,
4//! `HistogramLogReader`, and `HistogramLogProcessor`, are a way to record a sequence of histograms
5//! over time. Suppose you were running a load test for an hour: you might want to record a
6//! histogram per second or minute so that you could correlate measurements with behavior you might
7//! see in logs, etc.
8//!
9//! An interval log contains some initial metadata, then a sequence of histograms, each with some
10//! additional metadata (timestamps, etc). See `IntervalLogHistogram`.
11//!
12//! The intervals in the log should be ordered by start timestamp. It's possible to write (and
13//! parse) logs with intervals in any order, but the expectation is that they will be sorted.
14//!
15//! To parse a log, see `IntervalLogIterator`. To write a log, see `IntervalLogWriterBuilder`.
16//!
17//! # Timestamps
18//!
19//! Each interval has a timestamp in seconds associated with it. However, it's not necessarily as
20//! simple as just interpreting the number as seconds since the epoch. There are two optional pieces
21//! of header metadata: "StartTime" and "BaseTime". Neither, one, or both of these may be present.
22//! It is possible to have multiple StartTime or BaseTime entries in the log, or even interleaved
23//! with interval histograms, but that is confusing, so this API prevents you from doing so.
24//!
25//! When BaseTime is present, per-interval timestamps are the number of seconds since BaseTime. When
26//! it is absent, the per-interval timestamps represent fractional seconds since the epoch (aka
27//! Unix time). BaseTime is useful because it allows you to have somewhat human-readable timestamps
28//! for each interval -- it's easier to see 245.3 and see that that's 4 minutes after the start of
29//! whatever it is you're doing than it is to mentally parse a Unix timestamp. Naturally, you can
30//! always calculate the deltas after the fact if you're willing to write a little tooling, but in
31//! some cases log files are consumed by humans too.
32//!
33//! While BaseTime is used to redefine per-interval timestamps for human readability, StartTime
34//! provides information about when the process that is generating the log reached some "start"
35//! condition. It's frequently the case that intervals will start some seconds after the process
36//! started, whether due to initial warmup before a benchmark or because it just takes a
37//! while to start up. If this is relevant to your workload, use StartTime to record the actual
38//! process start time (or other relevant "start" condition, like when a benchmark begins on an
39//! already long-running process). You could then use this when processing a log to more accurately
40//! plot interval data over time. Put differently, StartTime doesn't change the interpretation of
41//! interval timestamps directly like BaseTime; instead, it provides a hint to the consumer of the
42//! log that the "start" (whatever that means to you -- process start, etc) was at a different time
43//! than that associated with the first interval.
44//!
45//! #### Example scenario
46//!
47//! To explain their usage, suppose we're running a multi-hour benchmark on a process that starts
48//! up at a Unix time of 1500000000. We'll be recording separate log files per hour just to
49//! demonstrate the interaction between BaseTime, StartTime, and interval log timestamps.
50//!
51//! The process starts up, warms up its caches, JIT compiles, etc and is ready to start its
52//! benchmark 40 seconds later, so we create the first interval log file and record a StartTime of
53//! 1500000040. If the actual process start (rather than benchmark start) is more useful to you,
54//! using a StartTime of 1500000000 would be reasonable, but we'll stick with 1500000040.
55//!
56//! We'll use a BaseTime of 1500000040 because that's when the benchmark began, but 1500000000 would
57//! also be a reasonable choice here -- it would just make the per-interval deltas 40 seconds
58//! larger, which might be a more useful way of recording them, depending on the situation.
59//!
60//! The benchmark produces an interval histogram for each 60 seconds of workload, so the first one
61//! is ready at 1500000100, and is recorded with a delta timestamp of 60. This goes on for another
62//! hour, with the last one being 3540 seconds after the start of the benchmark with a corresponding
63//! delta of 3540.
64//!
65//! At the start of the second hour, the time is 1500003640 and the first log file is ended (which
66//! is straightforward: just stop writing to it and close the file) and the second log file is
67//! opened. It still uses a StartTime of 1500000040 because we want to represent that this log
68//! pertains to something that started an hour ago, but we'll use a BaseTime of 1500003640 so that
69//! our delta timestamps start over at 0. Again, you could just as well decide to use the same
70//! BaseTime of 1500000040 if you prefer, but it's arguably easier for a human to parse "h hours
71//! into the benchmark, s seconds past the hour" than it is to mentally divide by 3600, so we'll
72//! go with the new BaseTime.
73//!
74//! Suppose now you wanted to process these logs and show information (e.g. the 0.999th quantile of
75//! each interval) as the y-axis on a plot where time is the x-axis. You would want to have
76//! StartTime be the zero of the x-axis. For each interval, calculate the Unix time by adding its
77//! timestamp to BaseTime, where BaseTime is assumed to be zero if it's not specified. The point on
78//! the x-axis for that interval would be the result of subtracting StartTime from that Unix time.
79//! As an example, the 17th minute in the 2nd hour would have an interval timestamp of approximately
80//! 1020, which when added to 1500003640 is 1500004660. The difference between StartTime and then is
81//! 4620. You might choose to display that in units of minutes, which would be 4620 / 60 = 77.
82//!
83//! #### Java interop
84//!
85//! Since you may be interoperating with the Java reference implementation, here's a summary of the
86//! logic used by the `HistogramLogReader` class for StartTime and BaseTime. It's slightly different
87//! than what was described above, presumably for legacy compatibility reasons. This class stores
88//! the StartTime as a field which is exposed via a getter, and also integrates filtering intervals
89//! based on windows for either "absolute" (Unix time) timestamps or "offset" timestamps (delta vs
90//! the StartTime), so if you're using that filtering functionality, you need to understand how it
91//! ends up setting its internal version of StartTime.
92//!
93//! - Neither StartTime nor BaseTime are present: interval timestamps are interpreted as seconds
94//! since the epoch. The first interval's timestamp is stored to the StartTime field.
95//! - StartTime is present: StartTime is a number of seconds since epoch, and interval timestamps
96//! may be interpreted as deltas to be added to StartTime or as "absolute" Unix time depending on a
97//! heuristic. In other words, the heuristic chooses between setting the effective BaseTime to 0 or
98//! to StartTime. Specifically, the heuristic interprets interval timestamps as deltas if they are
99//! more than a year's worth of seconds smaller than StartTime and as absolute timestamps otherwise.
100//! - BaseTime is present: BaseTime is a number of seconds since epoch, and interval timestamps are
101//! interpreted as deltas. The first interval's (delta) timestamp is stored to the StartTime field.
102//! This is likely a bug, since StartTime should be an absolute timestamp, and appears to cause
103//! erroneous behavior when filtering by offset timestamps.
104//! - BaseTime and StartTime are present: The BaseTime is used like it is when it's the only one
105//! present: it's a number of seconds since epoch that serves as the starting point for the
106//! per-interval deltas to get a wall-clock time for each interval. No heuristics are applied to
107//! guess whether or not the intervals are absolute or deltas.
108//!
109//! The Java implementation also supports re-setting the StartTime and BaseTime if those entries
110//! exist more than once in the log. Suppose that you had an hour's worth of per-minute intervals,
111//! and then you set another StartTime to the current time and added a second hour's worth of
112//! intervals. Filtering for "all intervals between 5 and 10 minutes past the start" would result in
113//! two disjoint sequences of intervals, which is at the very least arguably unintuitive. However,
114//! you may well be working with log files that are structured that way, so this implementation's
115//! `IntervalLogIterator` will expose multiple StartTime, etc, entries as they appear in the log
116//! file.
117//!
118//! # Examples
119//!
120//! Parse a single interval from a log.
121//!
122//! ```
123//! use std::time;
124//! use hdrhistogram::serialization::interval_log;
125//!
126//! // two newline-separated log lines: a comment, then an interval
127//! let log = b"#I'm a comment\nTag=t,0.127,1.007,2.769,base64EncodedHisto\n";
128//!
129//! let mut iter = interval_log::IntervalLogIterator::new(&log[..]);
130//!
131//! // the comment is consumed and ignored by the parser, so the first event is an Interval
132//! match iter.next().unwrap() {
133//! Ok(interval_log::LogEntry::Interval(h)) => {
134//! assert_eq!(time::Duration::new(0, 127_000_000), h.start_timestamp());
135//! }
136//! _ => panic!()
137//! }
138//!
139//! // there are no more lines in the log; iteration complete
140//! assert_eq!(None, iter.next());
141//! ```
142//!
143//! Skip logs that started before 3 seconds.
144//!
145//! ```
146//! use hdrhistogram::serialization::interval_log;
147//!
148//! let log = "\
149//! #I'm a comment\n\
150//! Tag=a,0.123,1.007,2.769,base64EncodedHisto\n\
151//! 1.456,1.007,2.769,base64EncodedHisto\n\
152//! 3.789,1.007,2.769,base64EncodedHisto\n\
153//! Tag=b,4.123,1.007,2.769,base64EncodedHisto\n\
154//! 5.456,1.007,2.769,base64EncodedHisto\n\
155//! #Another comment\n"
156//! .as_bytes();
157//!
158//! let iter = interval_log::IntervalLogIterator::new(&log);
159//!
160//! let count = iter
161//! // only look at intervals (which are the only non-comment lines in this log)
162//! .filter_map(|e| match e {
163//! Ok(interval_log::LogEntry::Interval(ilh)) => Some(ilh),
164//! _ => None
165//! })
166//! // do any filtering you want
167//! .filter(|ilh| ilh.start_timestamp().as_secs() >= 3)
168//! .count();
169//!
170//! assert_eq!(3, count);
171//! ```
172//!
173//! Write a log.
174//!
175//! ```
176//! use std::{str, time};
177//! use hdrhistogram;
178//! use hdrhistogram::serialization;
179//! use hdrhistogram::serialization::interval_log;
180//!
181//! let mut buf = Vec::new();
182//! let mut serializer = serialization::V2Serializer::new();
183//!
184//! let mut h = hdrhistogram::Histogram::<u64>::new_with_bounds(
185//! 1, u64::max_value(), 3).unwrap();
186//! h.record(12345).unwrap();
187//!
188//! // limit scope of mutable borrow of `buf`
189//! {
190//! let now = time::SystemTime::now();
191//! let mut log_writer = interval_log::IntervalLogWriterBuilder::new()
192//! .add_comment("Comments are great")
193//! .with_start_time(now)
194//! .begin_log_with(&mut buf, &mut serializer)
195//! .unwrap();
196//!
197//! log_writer.write_comment(
198//! "You can have comments anywhere in the log").unwrap();
199//!
200//! log_writer
201//! .write_histogram(
202//! &h,
203//! now.elapsed().unwrap(),
204//! time::Duration::new(12, 345_678_901),
205//! interval_log::Tag::new("im-a-tag")
206//! )
207//! .unwrap();
208//! }
209//!
210//! // `buf` is now full of stuff; we check for the first line
211//! assert_eq!("#Comments are great\n", &str::from_utf8(&buf).unwrap()[0..20]);
212//! ```
213
214use std::cmp::Ordering;
215use std::error::Error;
216use std::fmt::Write;
217use std::str::FromStr;
218use std::{fmt, io, ops, str, time};
219
220use nom::branch::alt;
221use nom::bytes::complete::{tag, take, take_until, take_while1};
222use nom::character::complete::char;
223use nom::character::is_digit;
224use nom::combinator::{complete, map_res, opt, recognize};
225use nom::error::ErrorKind;
226use nom::number::complete::double;
227use nom::{Err, IResult};
228
229use super::super::{Counter, Histogram};
230use super::Serializer;
231
232/// Prepare an `IntervalLogWriter`.
233///
234/// This type only allows writing comments and headers. Once you're done writing those things, use
235/// `into_log_writer()` to convert this into an `IntervalLogWriter`.
236pub struct IntervalLogWriterBuilder {
237 comments: Vec<String>,
238 start_time: Option<f64>,
239 base_time: Option<f64>,
240 max_value_divisor: f64,
241}
242
243impl Default for IntervalLogWriterBuilder {
244 fn default() -> Self {
245 Self::new()
246 }
247}
248
249impl IntervalLogWriterBuilder {
250 /// Create a new log writer that writes to `writer` and serializes histograms with `serializer`.
251 pub fn new() -> IntervalLogWriterBuilder {
252 IntervalLogWriterBuilder {
253 comments: Vec::new(),
254 start_time: None,
255 base_time: None,
256 max_value_divisor: 1.0,
257 }
258 }
259
260 /// Add a comment line to be written when the writer is built.
261 ///
262 /// Comments containing '\n' will be transformed into multiple lines of comments.
263 pub fn add_comment(&mut self, s: &str) -> &mut Self {
264 self.comments.push(s.to_owned());
265 self
266 }
267
268 /// Set a StartTime. See the module-level documentation for more info.
269 ///
270 /// This can be called multiple times, but only the value for the most recent invocation will
271 /// be written.
272 pub fn with_start_time(&mut self, time: time::SystemTime) -> &mut Self {
273 self.start_time = Some(system_time_as_fp_seconds(time));
274 self
275 }
276
277 /// Set a BaseTime. See the module-level documentation for more info.
278 ///
279 /// This can be called multiple times, but only the value for the most recent invocation will
280 /// be written.
281 pub fn with_base_time(&mut self, time: time::SystemTime) -> &mut Self {
282 self.base_time = Some(system_time_as_fp_seconds(time));
283 self
284 }
285
286 /// Set a max value divisor.
287 ///
288 /// This is used to scale down the max value part of an interval log to something that may be
289 /// more human readable. The max value in the log is only for human consumption, so you might
290 /// prefer to divide by 10<sup>9</sup> to turn nanoseconds into fractional seconds, for
291 /// instance.
292 ///
293 /// If this is not set, 1.0 will be used.
294 ///
295 /// This can be called multiple times, but only the value for the most recent invocation will
296 /// be written.
297 pub fn with_max_value_divisor(&mut self, max_value_divisor: f64) -> &mut Self {
298 self.max_value_divisor = max_value_divisor;
299 self
300 }
301
302 /// Build a LogWriter and apply any configured headers.
303 #[allow(clippy::float_cmp)]
304 pub fn begin_log_with<'a, 'b, W: 'a + io::Write, S: 'b + Serializer>(
305 &self,
306 writer: &'a mut W,
307 serializer: &'b mut S,
308 ) -> Result<IntervalLogWriter<'a, 'b, W, S>, io::Error> {
309 let mut internal_writer = InternalLogWriter {
310 writer,
311 serializer,
312 text_buf: String::new(),
313 serialize_buf: Vec::new(),
314 max_value_divisor: self.max_value_divisor,
315 };
316
317 for c in &self.comments {
318 internal_writer.write_comment(&c)?;
319 }
320
321 if let Some(st) = self.start_time {
322 internal_writer.write_fmt(format_args!(
323 "#[StartTime: {:.3} (seconds since epoch)]\n",
324 st
325 ))?;
326 }
327
328 if let Some(bt) = self.base_time {
329 internal_writer.write_fmt(format_args!(
330 "#[BaseTime: {:.3} (seconds since epoch)]\n",
331 bt
332 ))?;
333 }
334
335 // The Java impl doesn't write a comment for this but it's confusing to silently modify the
336 // max value without leaving a trace
337 if self.max_value_divisor != 1.0_f64 {
338 internal_writer.write_fmt(format_args!(
339 "#[MaxValueDivisor: {:.3}]\n",
340 self.max_value_divisor
341 ))?;
342 }
343
344 Ok(IntervalLogWriter { internal_writer })
345 }
346}
347
348/// Writes interval histograms in an interval log.
349///
350/// This isn't created directly; start with an `IntervalLogWriterBuilder`. Once you've written the
351/// headers and ended up with an `IntervalLogWriter`, typical usage would be to write a histogram
352/// at regular intervals (e.g. once a second).
353///
354/// ```
355/// use hdrhistogram::serialization;
356/// use hdrhistogram::serialization::interval_log;
357///
358/// let mut buf = Vec::new();
359/// let mut serializer = serialization::V2Serializer::new();
360///
361/// // create a writer via a builder
362/// let mut writer = interval_log::IntervalLogWriterBuilder::new()
363/// .begin_log_with(&mut buf, &mut serializer)
364/// .unwrap();
365///
366/// writer.write_comment("Comment 2").unwrap();
367///
368/// // .. write some intervals
369/// ```
370pub struct IntervalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
371 internal_writer: InternalLogWriter<'a, 'b, W, S>,
372}
373
374impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> IntervalLogWriter<'a, 'b, W, S> {
375 /// Write a comment line.
376 ///
377 /// Comments containing '\n' will be transformed into multiple lines of comments.
378 pub fn write_comment(&mut self, s: &str) -> io::Result<()> {
379 self.internal_writer.write_comment(s)
380 }
381
382 /// Write an interval histogram.
383 ///
384 /// `start_timestamp` is the time since the epoch in seconds that measurements started being
385 /// recorded in this interval. If you're using a StartTime or BaseTime offset, you should
386 /// instead use a delta since that time. See the discussion about timestamps in the module-level
387 /// documentation.
388 ///
389 /// `duration` is the duration of the interval in seconds.
390 ///
391 /// `tag` is an optional tag for this histogram.
392 pub fn write_histogram<T: Counter>(
393 &mut self,
394 h: &Histogram<T>,
395 start_timestamp: time::Duration,
396 duration: time::Duration,
397 tag: Option<Tag>,
398 ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
399 self.internal_writer
400 .write_histogram(h, start_timestamp, duration, tag)
401 }
402}
403
404/// Errors that can occur while writing a log.
405#[derive(Debug)]
406pub enum IntervalLogWriterError<E> {
407 /// Histogram serialization failed.
408 SerializeError(E),
409 /// An i/o error occurred.
410 IoError(io::Error),
411}
412
413impl<E> From<io::Error> for IntervalLogWriterError<E> {
414 fn from(e: io::Error) -> Self {
415 IntervalLogWriterError::IoError(e)
416 }
417}
418
419impl<E: fmt::Display + fmt::Debug> fmt::Display for IntervalLogWriterError<E> {
420 fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
421 match self {
422 IntervalLogWriterError::SerializeError(e) => {
423 write!(f, "Histogram serialization failed: {}", e)
424 }
425 IntervalLogWriterError::IoError(e) => write!(f, "An i/o error occurred: {}", e),
426 }
427 }
428}
429
430impl<E: Error + 'static> Error for IntervalLogWriterError<E> {
431 fn source(&self) -> Option<&(dyn Error + 'static)> {
432 match self {
433 IntervalLogWriterError::SerializeError(e) => Some(e),
434 IntervalLogWriterError::IoError(e) => Some(e),
435 }
436 }
437}
438
439/// Write interval logs.
440struct InternalLogWriter<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> {
441 writer: &'a mut W,
442 serializer: &'b mut S,
443 text_buf: String,
444 serialize_buf: Vec<u8>,
445 max_value_divisor: f64,
446}
447
448impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> InternalLogWriter<'a, 'b, W, S> {
449 fn write_fmt(&mut self, args: fmt::Arguments) -> io::Result<()> {
450 self.writer.write_fmt(args)
451 }
452
453 fn write_comment(&mut self, s: &str) -> io::Result<()> {
454 for l in s.split('\n') {
455 writeln!(self.writer, "#{}", l)?;
456 }
457
458 Ok(())
459 }
460
461 fn write_histogram<T: Counter>(
462 &mut self,
463 h: &Histogram<T>,
464 start_timestamp: time::Duration,
465 duration: time::Duration,
466 tag: Option<Tag>,
467 ) -> Result<(), IntervalLogWriterError<S::SerializeError>> {
468 self.serialize_buf.clear();
469 self.text_buf.clear();
470
471 if let Some(Tag(s)) = tag {
472 write!(self.text_buf, "Tag={},", &s).expect("Writes to a String can't fail");
473 }
474
475 write!(
476 self.writer,
477 "{}{:.3},{:.3},{:.3},",
478 self.text_buf,
479 duration_as_fp_seconds(start_timestamp),
480 duration_as_fp_seconds(duration),
481 h.max() as f64 / self.max_value_divisor // because the Java impl does it this way
482 )?;
483
484 self.text_buf.clear();
485 let _len = self
486 .serializer
487 .serialize(h, &mut self.serialize_buf)
488 .map_err(IntervalLogWriterError::SerializeError)?;
489 base64::encode_config_buf(&self.serialize_buf, base64::STANDARD, &mut self.text_buf);
490
491 self.writer.write_all(self.text_buf.as_bytes())?;
492 self.writer.write_all(b"\n")?;
493
494 Ok(())
495 }
496}
497
498/// A tag for an interval histogram.
499///
500/// Tags are just `str`s that do not contain a few disallowed characters: ',', '\r', '\n', and ' '.
501///
502/// To get the wrapped `str` back out, use `as_str()` or the `Deref<str>` implementation
503/// (`&some_tag`).
504#[derive(Debug, PartialEq, Clone, Copy)]
505pub struct Tag<'a>(&'a str);
506
507impl<'a> Tag<'a> {
508 /// Create a new Tag.
509 ///
510 /// If a disallowed character is present, this will return `None`.
511 pub fn new(s: &'a str) -> Option<Tag<'a>> {
512 if s.chars()
513 .any(|c| c == ',' || c == '\r' || c == '\n' || c == ' ')
514 {
515 None
516 } else {
517 Some(Tag(s))
518 }
519 }
520
521 /// Returns the tag contents as a str.
522 pub fn as_str(&self) -> &'a str {
523 self.0
524 }
525}
526
527impl<'a> ops::Deref for Tag<'a> {
528 type Target = str;
529
530 fn deref(&self) -> &Self::Target {
531 self.as_str()
532 }
533}
534
535/// An individual interval histogram parsed from an interval log.
536#[derive(PartialEq, Debug)]
537pub struct IntervalLogHistogram<'a> {
538 tag: Option<Tag<'a>>,
539 start_timestamp: time::Duration,
540 duration: time::Duration,
541 max: f64,
542 encoded_histogram: &'a str,
543}
544
545impl<'a> IntervalLogHistogram<'a> {
546 /// Tag, if any is present.
547 pub fn tag(&self) -> Option<Tag<'a>> {
548 self.tag
549 }
550
551 /// Timestamp of the start of the interval in seconds, expressed as a `Duration` relative to
552 /// some start point.
553 ///
554 /// The timestamp may be absolute vs the epoch, or there may be a `StartTime` or `BaseTime` for
555 /// the log, in which case you may wish to consider this number as a delta vs those timestamps.
556 /// See the module-level documentation about timestamps.
557 pub fn start_timestamp(&self) -> time::Duration {
558 self.start_timestamp
559 }
560
561 /// Duration of the interval in seconds.
562 pub fn duration(&self) -> time::Duration {
563 self.duration
564 }
565
566 /// Max value in the encoded histogram
567 ///
568 /// This max value is the max of the histogram divided by some scaling factor (which may be
569 /// 1.0).
570 pub fn max(&self) -> f64 {
571 self.max
572 }
573
574 /// Base64-encoded serialized histogram.
575 ///
576 /// If you need the deserialized histogram, base64-decode and use a `Deserializer` on the
577 /// resulting bytes.
578 pub fn encoded_histogram(&self) -> &'a str {
579 self.encoded_histogram
580 }
581}
582
583#[derive(PartialEq, Debug)]
584/// Represents one non-comment line in an interval log.
585///
586/// One thing to note is that the way your interval timestamps work can vary. If your log was
587/// written with a StartTime or BaseTime, that metadata will appear in header comments, and that
588/// will be represented by the iterator providing the corresponding variants here. The presence
589/// of those timestamps will affect how you should interpret the timestamps for individual
590/// intervals. See the module-level documentation.
591#[allow(variant_size_differences)]
592pub enum LogEntry<'a> {
593 /// Logs may include a StartTime. If present, it represents seconds since the epoch.
594 StartTime(time::Duration),
595 /// Logs may include a BaseTime. If present, it represents seconds since the epoch.
596 BaseTime(time::Duration),
597 /// An individual interval histogram.
598 Interval(IntervalLogHistogram<'a>),
599}
600
601/// Errors that occur when parsing an interval log.
602#[derive(Debug, PartialEq)]
603pub enum LogIteratorError {
604 /// Parsing failed
605 ParseError {
606 /// Offset in the input where the failed parse started
607 offset: usize,
608 },
609}
610
611/// Parse interval logs.
612///
613/// This iterator exposes each item (excluding comments and other information-free lines). See
614/// `LogEntry`.
615///
616/// Because histogram deserialization is deferred, parsing logs is fast. See the `interval_log`
617/// benchmark if you wish to see how it does on your hardware. As a baseline, parsing a log of 1000
618/// random histograms of 10,000 values each takes 2ms total on an E5-1650v3.
619///
620/// Deferring deserialization is handy because it allows you to cheaply navigate the log to find
621/// the records you care about (e.g. ones in a certain time range, or with a certain tag) without
622/// doing all the allocation, etc, of deserialization.
623///
624/// If you're looking for a direct port of the Java impl's `HistogramLogReader`, this isn't one: it
625/// won't deserialize for you, and it pushes the burden of figuring out what to do with StartTime,
626/// BaseTime, etc to you, and there aren't built in functions to filter by timestamp. On the other
627/// hand, because it doesn't do those things, it is much more flexible: you can easily build any
628/// sort of filtering you want, not just timestamp ranges, because you have cheap access to all the
629/// metadata before incurring the cost of deserialization. If you're not using any timestamp
630/// headers, or at least using them in straightforward ways, it is easy to accumulate the
631/// timestamp state you need. Since all the parsing is taken care of already, writing your own
632/// `HistogramLogReader` equivalent that fits the way your logs are assembled is just a couple of
633/// lines. (And if you're doing complex stuff, we probably wouldn't have built something that fits
634/// your quirky logs anyway!)
635///
636/// This parses from a slice representing the complete file because it made implementation easier
637/// (and also supports mmap'd files for maximum parsing speed). If parsing from a `Read` is
638/// important for your use case, open an issue about it.
639pub struct IntervalLogIterator<'a> {
640 orig_len: usize,
641 input: &'a [u8],
642 ended: bool,
643}
644
645impl<'a> IntervalLogIterator<'a> {
646 /// Create a new iterator from the UTF-8 bytes of an interval log.
647 pub fn new(input: &'a [u8]) -> IntervalLogIterator<'a> {
648 IntervalLogIterator {
649 orig_len: input.len(),
650 input,
651 ended: false,
652 }
653 }
654}
655
656impl<'a> Iterator for IntervalLogIterator<'a> {
657 type Item = Result<LogEntry<'a>, LogIteratorError>;
658
659 fn next(&mut self) -> Option<Self::Item> {
660 loop {
661 if self.ended {
662 return None;
663 }
664
665 if self.input.is_empty() {
666 self.ended = true;
667 return None;
668 }
669
670 // Look for magic comments first otherwise they will get matched by the simple comment
671 // parser
672 if let Ok((rest, e)) = log_entry(self.input) {
673 self.input = rest;
674 return Some(Ok(e));
675 }
676
677 // it wasn't a log entry; try parsing a comment
678 match ignored_line(self.input) {
679 Ok((rest, _)) => {
680 self.input = rest;
681 continue;
682 }
683 _ => {
684 self.ended = true;
685 return Some(Err(LogIteratorError::ParseError {
686 offset: self.orig_len - self.input.len(),
687 }));
688 }
689 }
690 }
691 }
692}
693
694fn duration_as_fp_seconds(d: time::Duration) -> f64 {
695 d.as_secs() as f64 + f64::from(d.subsec_nanos()) / 1_000_000_000_f64
696}
697
698fn system_time_as_fp_seconds(time: time::SystemTime) -> f64 {
699 match time.duration_since(time::UNIX_EPOCH) {
700 Ok(dur_after_epoch) => duration_as_fp_seconds(dur_after_epoch),
701 // Doesn't seem possible to be before the epoch, but using a negative number seems like
702 // a reasonable representation if it does occur
703 Err(t) => duration_as_fp_seconds(t.duration()) * -1_f64,
704 }
705}
706
707fn start_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
708 let (input, _) = tag("#[StartTime: ")(input)?;
709 let (input, duration) = fract_sec_duration(input)?;
710 let (input, _) = char(' ')(input)?;
711 let (input, _) = take_until("\n")(input)?;
712 let (input, _) = take(1_usize)(input)?;
713 Ok((input, LogEntry::StartTime(duration)))
714}
715
716fn base_time(input: &[u8]) -> IResult<&[u8], LogEntry> {
717 let (input, _) = tag("#[BaseTime: ")(input)?;
718 let (input, duration) = fract_sec_duration(input)?;
719 let (input, _) = char(' ')(input)?;
720 let (input, _) = take_until("\n")(input)?;
721 let (input, _) = take(1_usize)(input)?;
722 Ok((input, LogEntry::BaseTime(duration)))
723}
724
725fn tag_bytes(input: &[u8]) -> IResult<&[u8], &[u8]> {
726 let (input, _) = tag("Tag=")(input)?;
727 let (input, tag) = take_until(",")(input)?;
728 let (input, _) = take(1_usize)(input)?;
729 Ok((input, tag))
730}
731
732fn tag_parser(input: &[u8]) -> IResult<&[u8], Tag> {
733 let (input, tag) = map_res(tag_bytes, str::from_utf8)(input)?;
734 Ok((input, Tag(tag)))
735}
736
737fn interval_hist(input: &[u8]) -> IResult<&[u8], LogEntry> {
738 let (input, tag) = opt(tag_parser)(input)?;
739 let (input, start_timestamp) = fract_sec_duration(input)?;
740 let (input, _) = char(',')(input)?;
741 let (input, duration) = fract_sec_duration(input)?;
742 let (input, _) = char(',')(input)?;
743 let (input, max) = double(input)?;
744 let (input, _) = char(',')(input)?;
745 let (input, encoded_histogram) = map_res(take_until("\n"), str::from_utf8)(input)?;
746 let (input, _) = take(1_usize)(input)?;
747
748 Ok((
749 input,
750 LogEntry::Interval(IntervalLogHistogram {
751 tag,
752 start_timestamp,
753 duration,
754 max,
755 encoded_histogram,
756 }),
757 ))
758}
759
760fn log_entry(input: &[u8]) -> IResult<&[u8], LogEntry<'_>> {
761 complete(alt((start_time, base_time, interval_hist)))(input)
762}
763
764fn comment_line(input: &[u8]) -> IResult<&[u8], ()> {
765 let (input, _) = tag("#")(input)?;
766 let (input, _) = take_until("\n")(input)?;
767 let (input, _) = take(1_usize)(input)?;
768 Ok((input, ()))
769}
770
771fn legend(input: &[u8]) -> IResult<&[u8], ()> {
772 let (input, _) = tag("\"StartTimestamp\"")(input)?;
773 let (input, _) = take_until("\n")(input)?;
774 let (input, _) = take(1_usize)(input)?;
775 Ok((input, ()))
776}
777
778fn ignored_line(input: &[u8]) -> IResult<&[u8], ()> {
779 alt((comment_line, legend))(input)
780}
781
782fn fract_sec_duration(input: &[u8]) -> IResult<&[u8], time::Duration> {
783 let (rest, data) = fract_sec_tuple(input)?;
784 let (secs, nanos_str) = data;
785
786 // only read up to 9 digits since we can only support nanos, not smaller precision
787 let nanos_parse_res = match nanos_str.len().cmp(&9) {
788 Ordering::Greater => nanos_str[0..9].parse::<u32>(),
789 Ordering::Equal => nanos_str.parse::<u32>(),
790 Ordering::Less => nanos_str
791 .parse::<u32>()
792 // subtraction will not overflow because len is < 9
793 .map(|n| n * 10_u32.pow(9 - nanos_str.len() as u32)),
794 };
795
796 if let Ok(nanos) = nanos_parse_res {
797 return Ok((rest, time::Duration::new(secs, nanos)));
798 }
799
800 // nanos were invalid utf8. We don't expose these errors, so don't bother defining a
801 // custom error type.
802 Err(Err::Error(error_position!(input, ErrorKind::Alpha)))
803}
804
805type FResult<'a> = IResult<&'a [u8], (u64, &'a str)>;
806
807fn fract_sec_tuple(input: &[u8]) -> FResult {
808 let (input, secs) = map_res(
809 map_res(recognize(take_until(".")), str::from_utf8),
810 u64::from_str,
811 )(input)?;
812 let (input, _) = tag(".")(input)?;
813 let (input, nanos_str) = map_res(complete(take_while1(is_digit)), str::from_utf8)(input)?;
814 Ok((input, (secs, nanos_str)))
815}
816
817#[cfg(test)]
818mod tests;