fail/
lib.rs

1// Copyright 2019 TiKV Project Authors. Licensed under Apache-2.0.
2
3//! A fail point implementation for Rust.
4//!
5//! Fail points are code instrumentations that allow errors and other behavior
6//! to be injected dynamically at runtime, primarily for testing purposes. Fail
7//! points are flexible and can be configured to exhibit a variety of behavior,
8//! including panics, early returns, and sleeping. They can be controlled both
9//! programmatically and via the environment, and can be triggered
10//! conditionally and probabilistically.
11//!
12//! This crate is inspired by FreeBSD's
13//! [failpoints](https://freebsd.org/cgi/man.cgi?query=fail).
14//!
15//! ## Usage
16//!
17//! First, add this to your `Cargo.toml`:
18//!
19//! ```toml
20//! [dependencies]
21//! fail = "0.5"
22//! ```
23//!
24//! Now you can import the `fail_point!` macro from the `fail` crate and use it
25//! to inject dynamic failures.
26//!
27//! As an example, here's a simple program that uses a fail point to simulate an
28//! I/O panic:
29//!
30//! ```rust
31//! use fail::{fail_point, FailScenario};
32//!
33//! fn do_fallible_work() {
34//!     fail_point!("read-dir");
35//!     let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
36//!     // ... do some work on the directory ...
37//! }
38//!
39//! let scenario = FailScenario::setup();
40//! do_fallible_work();
41//! scenario.teardown();
42//! println!("done");
43//! ```
44//!
45//! Here, the program calls `unwrap` on the result of `read_dir`, a function
46//! that returns a `Result`. In other words, this particular program expects
47//! this call to `read_dir` to always succeed. And in practice it almost always
48//! will, which makes the behavior of this program when `read_dir` fails
49//! difficult to test. By instrumenting the program with a fail point we can
50//! pretend that `read_dir` failed, causing the subsequent `unwrap` to panic,
51//! and allowing us to observe the program's behavior under failure conditions.
52//!
53//! When the program is run normally it just prints "done":
54//!
55//! ```sh
56//! $ cargo run --features fail/failpoints
57//!     Finished dev [unoptimized + debuginfo] target(s) in 0.01s
58//!      Running `target/debug/failpointtest`
59//! done
60//! ```
61//!
62//! But now, by setting the `FAILPOINTS` variable we can see what happens if the
63//! `read_dir` fails:
64//!
65//! ```sh
66//! FAILPOINTS=read-dir=panic cargo run --features fail/failpoints
67//!     Finished dev [unoptimized + debuginfo] target(s) in 0.01s
68//!      Running `target/debug/failpointtest`
69//! thread 'main' panicked at 'failpoint read-dir panic', /home/ubuntu/.cargo/registry/src/github.com-1ecc6299db9ec823/fail-0.2.0/src/lib.rs:286:25
70//! note: Run with `RUST_BACKTRACE=1` for a backtrace.
71//! ```
72//!
73//! ## Usage in tests
74//!
75//! The previous example triggers a fail point by modifying the `FAILPOINT`
76//! environment variable. In practice, you'll often want to trigger fail points
77//! programmatically, in unit tests.
78//! Fail points are global resources, and Rust tests run in parallel,
79//! so tests that exercise fail points generally need to hold a lock to
80//! avoid interfering with each other. This is accomplished by `FailScenario`.
81//!
82//! Here's a basic pattern for writing unit tests tests with fail points:
83//!
84//! ```rust
85//! use fail::{fail_point, FailScenario};
86//!
87//! fn do_fallible_work() {
88//!     fail_point!("read-dir");
89//!     let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
90//!     // ... do some work on the directory ...
91//! }
92//!
93//! #[test]
94//! #[should_panic]
95//! fn test_fallible_work() {
96//!     let scenario = FailScenario::setup();
97//!     fail::cfg("read-dir", "panic").unwrap();
98//!
99//!     do_fallible_work();
100//!
101//!     scenario.teardown();
102//! }
103//! ```
104//!
105//! Even if a test does not itself turn on any fail points, code that it runs
106//! could trigger a fail point that was configured by another thread. Because of
107//! this it is a best practice to put all fail point unit tests into their own
108//! binary. Here's an example of a snippet from `Cargo.toml` that creates a
109//! fail-point-specific test binary:
110//!
111//! ```toml
112//! [[test]]
113//! name = "failpoints"
114//! path = "tests/failpoints/mod.rs"
115//! required-features = ["fail/failpoints"]
116//! ```
117//!
118//!
119//! ## Early return
120//!
121//! The previous examples illustrate injecting panics via fail points, but
122//! panics aren't the only &mdash; or even the most common &mdash; error pattern
123//! in Rust. The more common type of error is propagated by `Result` return
124//! values, and fail points can inject those as well with "early returns". That
125//! is, when configuring a fail point as "return" (as opposed to "panic"), the
126//! fail point will immediately return from the function, optionally with a
127//! configurable value.
128//!
129//! The setup for early return requires a slightly diferent invocation of the
130//! `fail_point!` macro. To illustrate this, let's modify the `do_fallible_work`
131//! function we used earlier to return a `Result`:
132//!
133//! ```rust
134//! use fail::{fail_point, FailScenario};
135//! use std::io;
136//!
137//! fn do_fallible_work() -> io::Result<()> {
138//!     fail_point!("read-dir");
139//!     let _dir: Vec<_> = std::fs::read_dir(".")?.collect();
140//!     // ... do some work on the directory ...
141//!     Ok(())
142//! }
143//!
144//! fn main() -> io::Result<()> {
145//!     let scenario = FailScenario::setup();
146//!     do_fallible_work()?;
147//!     scenario.teardown();
148//!     println!("done");
149//!     Ok(())
150//! }
151//! ```
152//!
153//! This example has more proper Rust error handling, with no unwraps
154//! anywhere. Instead it uses `?` to propagate errors via the `Result` type
155//! return values. This is more realistic Rust code.
156//!
157//! The "read-dir" fail point though is not yet configured to support early
158//! return, so if we attempt to configure it to "return", we'll see an error
159//! like
160//!
161//! ```sh
162//! $ FAILPOINTS=read-dir=return cargo run --features fail/failpoints
163//!     Finished dev [unoptimized + debuginfo] target(s) in 0.13s
164//!      Running `target/debug/failpointtest`
165//! thread 'main' panicked at 'Return is not supported for the fail point "read-dir"', src/main.rs:7:5
166//! note: Run with `RUST_BACKTRACE=1` for a backtrace.
167//! ```
168//!
169//! This error tells us that the "read-dir" fail point is not defined correctly
170//! to support early return, and gives us the line number of that fail point.
171//! What we're missing in the fail point definition is code describring _how_ to
172//! return an error value, and the way we do this is by passing `fail_point!` a
173//! closure that returns the same type as the enclosing function.
174//!
175//! Here's a variation that does so:
176//!
177//! ```rust
178//! # use std::io;
179//! fn do_fallible_work() -> io::Result<()> {
180//!     fail::fail_point!("read-dir", |_| {
181//!         Err(io::Error::new(io::ErrorKind::PermissionDenied, "error"))
182//!     });
183//!     let _dir: Vec<_> = std::fs::read_dir(".")?.collect();
184//!     // ... do some work on the directory ...
185//!     Ok(())
186//! }
187//! ```
188//!
189//! And now if the "read-dir" fail point is configured to "return" we get a
190//! different result:
191//!
192//! ```sh
193//! $ FAILPOINTS=read-dir=return cargo run --features fail/failpoints
194//!    Compiling failpointtest v0.1.0
195//!     Finished dev [unoptimized + debuginfo] target(s) in 2.38s
196//!      Running `target/debug/failpointtest`
197//! Error: Custom { kind: PermissionDenied, error: StringError("error") }
198//! ```
199//!
200//! This time, `do_fallible_work` returned the error defined in our closure,
201//! which propagated all the way up and out of main.
202//!
203//! ## Advanced usage
204//!
205//! That's the basics of fail points: defining them with `fail_point!`,
206//! configuring them with `FAILPOINTS` and `fail::cfg`, and configuring them to
207//! panic and return early. But that's not all they can do. To learn more see
208//! the documentation for [`cfg`](fn.cfg.html),
209//! [`cfg_callback`](fn.cfg_callback.html) and
210//! [`fail_point!`](macro.fail_point.html).
211//!
212//!
213//! ## Usage considerations
214//!
215//! For most effective fail point usage, keep in mind the following:
216//!
217//!  - Fail points are disabled by default and can be enabled via the `failpoints`
218//!    feature. When failpoints are disabled, no code is generated by the macro.
219//!  - Carefully consider complex, concurrent, non-deterministic combinations of
220//!    fail points. Put test cases exercising fail points into their own test
221//!    crate.
222//!  - Fail points might have the same name, in which case they take the
223//!    same actions. Be careful about duplicating fail point names, either within
224//!    a single crate, or across multiple crates.
225
226#![deny(missing_docs, missing_debug_implementations)]
227
228use std::collections::HashMap;
229use std::env::VarError;
230use std::fmt::Debug;
231use std::str::FromStr;
232use std::sync::atomic::{AtomicUsize, Ordering};
233use std::sync::{Arc, Condvar, Mutex, MutexGuard, RwLock, TryLockError};
234use std::time::{Duration, Instant};
235use std::{env, thread};
236
237#[derive(Clone)]
238struct SyncCallback(Arc<dyn Fn() + Send + Sync>);
239
240impl Debug for SyncCallback {
241    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
242        f.write_str("SyncCallback()")
243    }
244}
245
246impl PartialEq for SyncCallback {
247    #[allow(clippy::vtable_address_comparisons)]
248    fn eq(&self, other: &Self) -> bool {
249        Arc::ptr_eq(&self.0, &other.0)
250    }
251}
252
253impl SyncCallback {
254    fn new(f: impl Fn() + Send + Sync + 'static) -> SyncCallback {
255        SyncCallback(Arc::new(f))
256    }
257
258    fn run(&self) {
259        let callback = &self.0;
260        callback();
261    }
262}
263
264/// Supported tasks.
265#[derive(Clone, Debug, PartialEq)]
266enum Task {
267    /// Do nothing.
268    Off,
269    /// Return the value.
270    Return(Option<String>),
271    /// Sleep for some milliseconds.
272    Sleep(u64),
273    /// Panic with the message.
274    Panic(Option<String>),
275    /// Print the message.
276    Print(Option<String>),
277    /// Sleep until other action is set.
278    Pause,
279    /// Yield the CPU.
280    Yield,
281    /// Busy waiting for some milliseconds.
282    Delay(u64),
283    /// Call callback function.
284    Callback(SyncCallback),
285}
286
287#[derive(Debug)]
288struct Action {
289    task: Task,
290    freq: f32,
291    count: Option<AtomicUsize>,
292}
293
294impl PartialEq for Action {
295    fn eq(&self, hs: &Action) -> bool {
296        if self.task != hs.task || self.freq != hs.freq {
297            return false;
298        }
299        if let Some(ref lhs) = self.count {
300            if let Some(ref rhs) = hs.count {
301                return lhs.load(Ordering::Relaxed) == rhs.load(Ordering::Relaxed);
302            }
303        } else if hs.count.is_none() {
304            return true;
305        }
306        false
307    }
308}
309
310impl Action {
311    fn new(task: Task, freq: f32, max_cnt: Option<usize>) -> Action {
312        Action {
313            task,
314            freq,
315            count: max_cnt.map(AtomicUsize::new),
316        }
317    }
318
319    fn from_callback(f: impl Fn() + Send + Sync + 'static) -> Action {
320        let task = Task::Callback(SyncCallback::new(f));
321        Action {
322            task,
323            freq: 1.0,
324            count: None,
325        }
326    }
327
328    fn get_task(&self) -> Option<Task> {
329        use rand::Rng;
330
331        if let Some(ref cnt) = self.count {
332            let c = cnt.load(Ordering::Acquire);
333            if c == 0 {
334                return None;
335            }
336        }
337        if self.freq < 1f32 && !rand::thread_rng().gen_bool(f64::from(self.freq)) {
338            return None;
339        }
340        if let Some(ref ref_cnt) = self.count {
341            let mut cnt = ref_cnt.load(Ordering::Acquire);
342            loop {
343                if cnt == 0 {
344                    return None;
345                }
346                let new_cnt = cnt - 1;
347                match ref_cnt.compare_exchange_weak(
348                    cnt,
349                    new_cnt,
350                    Ordering::AcqRel,
351                    Ordering::Acquire,
352                ) {
353                    Ok(_) => break,
354                    Err(c) => cnt = c,
355                }
356            }
357        }
358        Some(self.task.clone())
359    }
360}
361
362fn partition(s: &str, pattern: char) -> (&str, Option<&str>) {
363    let mut splits = s.splitn(2, pattern);
364    (splits.next().unwrap(), splits.next())
365}
366
367impl FromStr for Action {
368    type Err = String;
369
370    /// Parse an action.
371    ///
372    /// `s` should be in the format `[p%][cnt*]task[(args)]`, `p%` is the frequency,
373    /// `cnt` is the max times the action can be triggered.
374    fn from_str(s: &str) -> Result<Action, String> {
375        let mut remain = s.trim();
376        let mut args = None;
377        // in case there is '%' in args, we need to parse it first.
378        let (first, second) = partition(remain, '(');
379        if let Some(second) = second {
380            remain = first;
381            if !second.ends_with(')') {
382                return Err("parentheses do not match".to_owned());
383            }
384            args = Some(&second[..second.len() - 1]);
385        }
386
387        let mut frequency = 1f32;
388        let (first, second) = partition(remain, '%');
389        if let Some(second) = second {
390            remain = second;
391            match first.parse::<f32>() {
392                Err(e) => return Err(format!("failed to parse frequency: {}", e)),
393                Ok(freq) => frequency = freq / 100.0,
394            }
395        }
396
397        let mut max_cnt = None;
398        let (first, second) = partition(remain, '*');
399        if let Some(second) = second {
400            remain = second;
401            match first.parse() {
402                Err(e) => return Err(format!("failed to parse count: {}", e)),
403                Ok(cnt) => max_cnt = Some(cnt),
404            }
405        }
406
407        let parse_timeout = || match args {
408            None => Err("sleep require timeout".to_owned()),
409            Some(timeout_str) => match timeout_str.parse() {
410                Err(e) => Err(format!("failed to parse timeout: {}", e)),
411                Ok(timeout) => Ok(timeout),
412            },
413        };
414
415        let task = match remain {
416            "off" => Task::Off,
417            "return" => Task::Return(args.map(str::to_owned)),
418            "sleep" => Task::Sleep(parse_timeout()?),
419            "panic" => Task::Panic(args.map(str::to_owned)),
420            "print" => Task::Print(args.map(str::to_owned)),
421            "pause" => Task::Pause,
422            "yield" => Task::Yield,
423            "delay" => Task::Delay(parse_timeout()?),
424            _ => return Err(format!("unrecognized command {:?}", remain)),
425        };
426
427        Ok(Action::new(task, frequency, max_cnt))
428    }
429}
430
431#[cfg_attr(feature = "cargo-clippy", allow(clippy::mutex_atomic))]
432#[derive(Debug)]
433struct FailPoint {
434    pause: Mutex<bool>,
435    pause_notifier: Condvar,
436    actions: RwLock<Vec<Action>>,
437    actions_str: RwLock<String>,
438}
439
440#[cfg_attr(feature = "cargo-clippy", allow(clippy::mutex_atomic))]
441impl FailPoint {
442    fn new() -> FailPoint {
443        FailPoint {
444            pause: Mutex::new(false),
445            pause_notifier: Condvar::new(),
446            actions: RwLock::default(),
447            actions_str: RwLock::default(),
448        }
449    }
450
451    fn set_actions(&self, actions_str: &str, actions: Vec<Action>) {
452        loop {
453            // TODO: maybe busy waiting here.
454            match self.actions.try_write() {
455                Err(TryLockError::WouldBlock) => {}
456                Ok(mut guard) => {
457                    *guard = actions;
458                    *self.actions_str.write().unwrap() = actions_str.to_string();
459                    return;
460                }
461                Err(e) => panic!("unexpected poison: {:?}", e),
462            }
463            let mut guard = self.pause.lock().unwrap();
464            *guard = false;
465            self.pause_notifier.notify_all();
466        }
467    }
468
469    #[cfg_attr(feature = "cargo-clippy", allow(clippy::option_option))]
470    fn eval(&self, name: &str) -> Option<Option<String>> {
471        let task = {
472            let actions = self.actions.read().unwrap();
473            match actions.iter().filter_map(Action::get_task).next() {
474                Some(Task::Pause) => {
475                    let mut guard = self.pause.lock().unwrap();
476                    *guard = true;
477                    loop {
478                        guard = self.pause_notifier.wait(guard).unwrap();
479                        if !*guard {
480                            break;
481                        }
482                    }
483                    return None;
484                }
485                Some(t) => t,
486                None => return None,
487            }
488        };
489
490        match task {
491            Task::Off => {}
492            Task::Return(s) => return Some(s),
493            Task::Sleep(t) => thread::sleep(Duration::from_millis(t)),
494            Task::Panic(msg) => match msg {
495                Some(ref msg) => panic!("{}", msg),
496                None => panic!("failpoint {} panic", name),
497            },
498            Task::Print(msg) => match msg {
499                Some(ref msg) => log::info!("{}", msg),
500                None => log::info!("failpoint {} executed.", name),
501            },
502            Task::Pause => unreachable!(),
503            Task::Yield => thread::yield_now(),
504            Task::Delay(t) => {
505                let timer = Instant::now();
506                let timeout = Duration::from_millis(t);
507                while timer.elapsed() < timeout {}
508            }
509            Task::Callback(f) => {
510                f.run();
511            }
512        }
513        None
514    }
515}
516
517/// Registry with failpoints configuration.
518type Registry = HashMap<String, Arc<FailPoint>>;
519
520#[derive(Debug, Default)]
521struct FailPointRegistry {
522    // TODO: remove rwlock or store *mut FailPoint
523    registry: RwLock<Registry>,
524}
525
526use once_cell::sync::Lazy;
527
528static REGISTRY: Lazy<FailPointRegistry> = Lazy::new(FailPointRegistry::default);
529static SCENARIO: Lazy<Mutex<&'static FailPointRegistry>> = Lazy::new(|| Mutex::new(&REGISTRY));
530
531/// Test scenario with configured fail points.
532#[derive(Debug)]
533pub struct FailScenario<'a> {
534    scenario_guard: MutexGuard<'a, &'static FailPointRegistry>,
535}
536
537impl<'a> FailScenario<'a> {
538    /// Set up the system for a fail points scenario.
539    ///
540    /// Configures all fail points specified in the `FAILPOINTS` environment variable.
541    /// It does not otherwise change any existing fail point configuration.
542    ///
543    /// The format of `FAILPOINTS` is `failpoint=actions;...`, where
544    /// `failpoint` is the name of the fail point. For more information
545    /// about fail point actions see the [`cfg`](fn.cfg.html) function and
546    /// the [`fail_point`](macro.fail_point.html) macro.
547    ///
548    /// `FAILPOINTS` may configure fail points that are not actually defined. In
549    /// this case the configuration has no effect.
550    ///
551    /// This function should generally be called prior to running a test with fail
552    /// points, and afterward paired with [`teardown`](#method.teardown).
553    ///
554    /// # Panics
555    ///
556    /// Panics if an action is not formatted correctly.
557    pub fn setup() -> Self {
558        // Cleanup first, in case of previous failed/panic'ed test scenarios.
559        let scenario_guard = SCENARIO.lock().unwrap_or_else(|e| e.into_inner());
560        let mut registry = scenario_guard.registry.write().unwrap();
561        Self::cleanup(&mut registry);
562
563        let failpoints = match env::var("FAILPOINTS") {
564            Ok(s) => s,
565            Err(VarError::NotPresent) => return Self { scenario_guard },
566            Err(e) => panic!("invalid failpoints: {:?}", e),
567        };
568        for mut cfg in failpoints.trim().split(';') {
569            cfg = cfg.trim();
570            if cfg.is_empty() {
571                continue;
572            }
573            let (name, order) = partition(cfg, '=');
574            match order {
575                None => panic!("invalid failpoint: {:?}", cfg),
576                Some(order) => {
577                    if let Err(e) = set(&mut registry, name.to_owned(), order) {
578                        panic!("unable to configure failpoint \"{}\": {}", name, e);
579                    }
580                }
581            }
582        }
583        Self { scenario_guard }
584    }
585
586    /// Tear down the fail point system.
587    ///
588    /// Clears the configuration of all fail points. Any paused fail
589    /// points will be notified before they are deactivated.
590    ///
591    /// This function should generally be called after running a test with fail points.
592    /// Calling `teardown` without previously calling `setup` results in a no-op.
593    pub fn teardown(self) {
594        drop(self)
595    }
596
597    /// Clean all registered fail points.
598    fn cleanup(registry: &mut std::sync::RwLockWriteGuard<'a, Registry>) {
599        for p in registry.values() {
600            // wake up all pause failpoint.
601            p.set_actions("", vec![]);
602        }
603        registry.clear();
604    }
605}
606
607impl<'a> Drop for FailScenario<'a> {
608    fn drop(&mut self) {
609        let mut registry = self.scenario_guard.registry.write().unwrap();
610        Self::cleanup(&mut registry)
611    }
612}
613
614/// Returns whether code generation for failpoints is enabled.
615///
616/// This function allows consumers to check (at runtime) whether the library
617/// was compiled with the (buildtime) `failpoints` feature, which enables
618/// code generation for failpoints.
619pub const fn has_failpoints() -> bool {
620    cfg!(feature = "failpoints")
621}
622
623/// Get all registered fail points.
624///
625/// Return a vector of `(name, actions)` pairs.
626pub fn list() -> Vec<(String, String)> {
627    let registry = REGISTRY.registry.read().unwrap();
628    registry
629        .iter()
630        .map(|(name, fp)| (name.to_string(), fp.actions_str.read().unwrap().clone()))
631        .collect()
632}
633
634#[doc(hidden)]
635pub fn eval<R, F: FnOnce(Option<String>) -> R>(name: &str, f: F) -> Option<R> {
636    let p = {
637        let registry = REGISTRY.registry.read().unwrap();
638        match registry.get(name) {
639            None => return None,
640            Some(p) => p.clone(),
641        }
642    };
643    p.eval(name).map(f)
644}
645
646/// Configure the actions for a fail point at runtime.
647///
648/// Each fail point can be configured with a series of actions, specified by the
649/// `actions` argument. The format of `actions` is `action[->action...]`. When
650/// multiple actions are specified, an action will be checked only when its
651/// former action is not triggered.
652///
653/// The format of a single action is `[p%][cnt*]task[(arg)]`. `p%` is the
654/// expected probability that the action is triggered, and `cnt*` is the max
655/// times the action can be triggered. The supported values of `task` are:
656///
657/// - `off`, the fail point will do nothing.
658/// - `return(arg)`, return early when the fail point is triggered. `arg` is passed to `$e` (
659/// defined via the `fail_point!` macro) as a string.
660/// - `sleep(milliseconds)`, sleep for the specified time.
661/// - `panic(msg)`, panic with the message.
662/// - `print(msg)`, log the message, using the `log` crate, at the `info` level.
663/// - `pause`, sleep until other action is set to the fail point.
664/// - `yield`, yield the CPU.
665/// - `delay(milliseconds)`, busy waiting for the specified time.
666///
667/// For example, `20%3*print(still alive!)->panic` means the fail point has 20% chance to print a
668/// message "still alive!" and 80% chance to panic. And the message will be printed at most 3
669/// times.
670///
671/// The `FAILPOINTS` environment variable accepts this same syntax for its fail
672/// point actions.
673///
674/// A call to `cfg` with a particular fail point name overwrites any existing actions for
675/// that fail point, including those set via the `FAILPOINTS` environment variable.
676pub fn cfg<S: Into<String>>(name: S, actions: &str) -> Result<(), String> {
677    let mut registry = REGISTRY.registry.write().unwrap();
678    set(&mut registry, name.into(), actions)
679}
680
681/// Configure the actions for a fail point at runtime.
682///
683/// Each fail point can be configured by a callback. Process will call this callback function
684/// when it meet this fail-point.
685pub fn cfg_callback<S, F>(name: S, f: F) -> Result<(), String>
686where
687    S: Into<String>,
688    F: Fn() + Send + Sync + 'static,
689{
690    let mut registry = REGISTRY.registry.write().unwrap();
691    let p = registry
692        .entry(name.into())
693        .or_insert_with(|| Arc::new(FailPoint::new()));
694    let action = Action::from_callback(f);
695    let actions = vec![action];
696    p.set_actions("callback", actions);
697    Ok(())
698}
699
700/// Remove a fail point.
701///
702/// If the fail point doesn't exist, nothing will happen.
703pub fn remove<S: AsRef<str>>(name: S) {
704    let mut registry = REGISTRY.registry.write().unwrap();
705    if let Some(p) = registry.remove(name.as_ref()) {
706        // wake up all pause failpoint.
707        p.set_actions("", vec![]);
708    }
709}
710
711/// Configure fail point in RAII style.
712#[derive(Debug)]
713pub struct FailGuard(String);
714
715impl Drop for FailGuard {
716    fn drop(&mut self) {
717        remove(&self.0);
718    }
719}
720
721impl FailGuard {
722    /// Configure the actions for a fail point during the lifetime of the returning `FailGuard`.
723    ///
724    /// Read documentation of [`cfg`] for more details.
725    pub fn new<S: Into<String>>(name: S, actions: &str) -> Result<FailGuard, String> {
726        let name = name.into();
727        cfg(&name, actions)?;
728        Ok(FailGuard(name))
729    }
730
731    /// Configure the actions for a fail point during the lifetime of the returning `FailGuard`.
732    ///
733    /// Read documentation of [`cfg_callback`] for more details.
734    pub fn with_callback<S, F>(name: S, f: F) -> Result<FailGuard, String>
735    where
736        S: Into<String>,
737        F: Fn() + Send + Sync + 'static,
738    {
739        let name = name.into();
740        cfg_callback(&name, f)?;
741        Ok(FailGuard(name))
742    }
743}
744
745fn set(
746    registry: &mut HashMap<String, Arc<FailPoint>>,
747    name: String,
748    actions: &str,
749) -> Result<(), String> {
750    let actions_str = actions;
751    // `actions` are in the format of `failpoint[->failpoint...]`.
752    let actions = actions
753        .split("->")
754        .map(Action::from_str)
755        .collect::<Result<_, _>>()?;
756    // Please note that we can't figure out whether there is a failpoint named `name`,
757    // so we may insert a failpoint that doesn't exist at all.
758    let p = registry
759        .entry(name)
760        .or_insert_with(|| Arc::new(FailPoint::new()));
761    p.set_actions(actions_str, actions);
762    Ok(())
763}
764
765/// Define a fail point (requires `failpoints` feature).
766///
767/// The `fail_point!` macro has three forms, and they all take a name as the
768/// first argument. The simplest form takes only a name and is suitable for
769/// executing most fail point behavior, including panicking, but not for early
770/// return or conditional execution based on a local flag.
771///
772/// The three forms of fail points look as follows.
773///
774/// 1. A basic fail point:
775///
776/// ```rust
777/// # #[macro_use] extern crate fail;
778/// fn function_return_unit() {
779///     fail_point!("fail-point-1");
780/// }
781/// ```
782///
783/// This form of fail point can be configured to panic, print, sleep, pause, etc., but
784/// not to return from the function early.
785///
786/// 2. A fail point that may return early:
787///
788/// ```rust
789/// # #[macro_use] extern crate fail;
790/// fn function_return_value() -> u64 {
791///     fail_point!("fail-point-2", |r| r.map_or(2, |e| e.parse().unwrap()));
792///     0
793/// }
794/// ```
795///
796/// This form of fail point can additionally be configured to return early from
797/// the enclosing function. It accepts a closure, which itself accepts an
798/// `Option<String>`, and is expected to transform that argument into the early
799/// return value. The argument string is sourced from the fail point
800/// configuration string. For example configuring this "fail-point-2" as
801/// "return(100)" will execute the fail point closure, passing it a `Some` value
802/// containing a `String` equal to "100"; the closure then parses it into the
803/// return value.
804///
805/// 3. A fail point with conditional execution:
806///
807/// ```rust
808/// # #[macro_use] extern crate fail;
809/// fn function_conditional(enable: bool) {
810///     fail_point!("fail-point-3", enable, |_| {});
811/// }
812/// ```
813///
814/// In this final form, the second argument is a local boolean expression that
815/// must evaluate to `true` before the fail point is evaluated. The third
816/// argument is again an early-return closure.
817///
818/// The three macro arguments (or "designators") are called `$name`, `$cond`,
819/// and `$e`. `$name` must be `&str`, `$cond` must be a boolean expression,
820/// and`$e` must be a function or closure that accepts an `Option<String>` and
821/// returns the same type as the enclosing function.
822///
823/// For more examples see the [crate documentation](index.html). For more
824/// information about controlling fail points see the [`cfg`](fn.cfg.html)
825/// function.
826#[macro_export]
827#[cfg(feature = "failpoints")]
828macro_rules! fail_point {
829    ($name:expr) => {{
830        $crate::eval($name, |_| {
831            panic!("Return is not supported for the fail point \"{}\"", $name);
832        });
833    }};
834    ($name:expr, $e:expr) => {{
835        if let Some(res) = $crate::eval($name, $e) {
836            return res;
837        }
838    }};
839    ($name:expr, $cond:expr, $e:expr) => {{
840        if $cond {
841            $crate::fail_point!($name, $e);
842        }
843    }};
844}
845
846/// Define a fail point (disabled, see `failpoints` feature).
847#[macro_export]
848#[cfg(not(feature = "failpoints"))]
849macro_rules! fail_point {
850    ($name:expr, $e:expr) => {{}};
851    ($name:expr) => {{}};
852    ($name:expr, $cond:expr, $e:expr) => {{}};
853}
854
855#[cfg(test)]
856mod tests {
857    use super::*;
858
859    use std::sync::*;
860
861    #[test]
862    fn test_has_failpoints() {
863        assert_eq!(cfg!(feature = "failpoints"), has_failpoints());
864    }
865
866    #[test]
867    fn test_off() {
868        let point = FailPoint::new();
869        point.set_actions("", vec![Action::new(Task::Off, 1.0, None)]);
870        assert!(point.eval("test_fail_point_off").is_none());
871    }
872
873    #[test]
874    fn test_return() {
875        let point = FailPoint::new();
876        point.set_actions("", vec![Action::new(Task::Return(None), 1.0, None)]);
877        let res = point.eval("test_fail_point_return");
878        assert_eq!(res, Some(None));
879
880        let ret = Some("test".to_owned());
881        point.set_actions("", vec![Action::new(Task::Return(ret.clone()), 1.0, None)]);
882        let res = point.eval("test_fail_point_return");
883        assert_eq!(res, Some(ret));
884    }
885
886    #[test]
887    fn test_sleep() {
888        let point = FailPoint::new();
889        let timer = Instant::now();
890        point.set_actions("", vec![Action::new(Task::Sleep(1000), 1.0, None)]);
891        assert!(point.eval("test_fail_point_sleep").is_none());
892        assert!(timer.elapsed() > Duration::from_millis(1000));
893    }
894
895    #[should_panic]
896    #[test]
897    fn test_panic() {
898        let point = FailPoint::new();
899        point.set_actions("", vec![Action::new(Task::Panic(None), 1.0, None)]);
900        point.eval("test_fail_point_panic");
901    }
902
903    #[test]
904    fn test_print() {
905        struct LogCollector(Arc<Mutex<Vec<String>>>);
906        impl log::Log for LogCollector {
907            fn enabled(&self, _: &log::Metadata) -> bool {
908                true
909            }
910            fn log(&self, record: &log::Record) {
911                let mut buf = self.0.lock().unwrap();
912                buf.push(format!("{}", record.args()));
913            }
914            fn flush(&self) {}
915        }
916
917        let buffer = Arc::new(Mutex::new(vec![]));
918        let collector = LogCollector(buffer.clone());
919        log::set_max_level(log::LevelFilter::Info);
920        log::set_boxed_logger(Box::new(collector)).unwrap();
921
922        let point = FailPoint::new();
923        point.set_actions("", vec![Action::new(Task::Print(None), 1.0, None)]);
924        assert!(point.eval("test_fail_point_print").is_none());
925        let msg = buffer.lock().unwrap().pop().unwrap();
926        assert_eq!(msg, "failpoint test_fail_point_print executed.");
927    }
928
929    #[test]
930    fn test_pause() {
931        let point = Arc::new(FailPoint::new());
932        point.set_actions("", vec![Action::new(Task::Pause, 1.0, None)]);
933        let p = point.clone();
934        let (tx, rx) = mpsc::channel();
935        thread::spawn(move || {
936            assert_eq!(p.eval("test_fail_point_pause"), None);
937            tx.send(()).unwrap();
938        });
939        assert!(rx.recv_timeout(Duration::from_secs(1)).is_err());
940        point.set_actions("", vec![Action::new(Task::Off, 1.0, None)]);
941        rx.recv_timeout(Duration::from_secs(1)).unwrap();
942    }
943
944    #[test]
945    fn test_yield() {
946        let point = FailPoint::new();
947        point.set_actions("", vec![Action::new(Task::Yield, 1.0, None)]);
948        assert!(point.eval("test_fail_point_yield").is_none());
949    }
950
951    #[test]
952    fn test_delay() {
953        let point = FailPoint::new();
954        let timer = Instant::now();
955        point.set_actions("", vec![Action::new(Task::Delay(1000), 1.0, None)]);
956        assert!(point.eval("test_fail_point_delay").is_none());
957        assert!(timer.elapsed() > Duration::from_millis(1000));
958    }
959
960    #[test]
961    fn test_frequency_and_count() {
962        let point = FailPoint::new();
963        point.set_actions("", vec![Action::new(Task::Return(None), 0.8, Some(100))]);
964        let mut count = 0;
965        let mut times = 0f64;
966        while count < 100 {
967            if point.eval("test_fail_point_frequency").is_some() {
968                count += 1;
969            }
970            times += 1f64;
971        }
972        assert!(100.0 / 0.9 < times && times < 100.0 / 0.7, "{}", times);
973        for _ in 0..times as u64 {
974            assert!(point.eval("test_fail_point_frequency").is_none());
975        }
976    }
977
978    #[test]
979    fn test_parse() {
980        let cases = vec![
981            ("return", Action::new(Task::Return(None), 1.0, None)),
982            (
983                "return(64)",
984                Action::new(Task::Return(Some("64".to_owned())), 1.0, None),
985            ),
986            ("5*return", Action::new(Task::Return(None), 1.0, Some(5))),
987            ("25%return", Action::new(Task::Return(None), 0.25, None)),
988            (
989                "125%2*return",
990                Action::new(Task::Return(None), 1.25, Some(2)),
991            ),
992            (
993                "return(2%5)",
994                Action::new(Task::Return(Some("2%5".to_owned())), 1.0, None),
995            ),
996            ("125%2*off", Action::new(Task::Off, 1.25, Some(2))),
997            (
998                "125%2*sleep(100)",
999                Action::new(Task::Sleep(100), 1.25, Some(2)),
1000            ),
1001            (" 125%2*off ", Action::new(Task::Off, 1.25, Some(2))),
1002            ("125%2*panic", Action::new(Task::Panic(None), 1.25, Some(2))),
1003            (
1004                "125%2*panic(msg)",
1005                Action::new(Task::Panic(Some("msg".to_owned())), 1.25, Some(2)),
1006            ),
1007            ("125%2*print", Action::new(Task::Print(None), 1.25, Some(2))),
1008            (
1009                "125%2*print(msg)",
1010                Action::new(Task::Print(Some("msg".to_owned())), 1.25, Some(2)),
1011            ),
1012            ("125%2*pause", Action::new(Task::Pause, 1.25, Some(2))),
1013            ("125%2*yield", Action::new(Task::Yield, 1.25, Some(2))),
1014            ("125%2*delay(2)", Action::new(Task::Delay(2), 1.25, Some(2))),
1015        ];
1016        for (expr, exp) in cases {
1017            let res: Action = expr.parse().unwrap();
1018            assert_eq!(res, exp);
1019        }
1020
1021        let fail_cases = vec![
1022            "delay",
1023            "sleep",
1024            "Return",
1025            "ab%return",
1026            "ab*return",
1027            "return(msg",
1028            "unknown",
1029        ];
1030        for case in fail_cases {
1031            assert!(case.parse::<Action>().is_err());
1032        }
1033    }
1034
1035    // This case should be tested as integration case, but when calling `teardown` other cases
1036    // like `test_pause` maybe also affected, so it's better keep it here.
1037    #[test]
1038    #[cfg_attr(not(feature = "failpoints"), ignore)]
1039    fn test_setup_and_teardown() {
1040        let f1 = || {
1041            fail_point!("setup_and_teardown1", |_| 1);
1042            0
1043        };
1044        let f2 = || {
1045            fail_point!("setup_and_teardown2", |_| 2);
1046            0
1047        };
1048        env::set_var(
1049            "FAILPOINTS",
1050            "setup_and_teardown1=return;setup_and_teardown2=pause;",
1051        );
1052        let scenario = FailScenario::setup();
1053        assert_eq!(f1(), 1);
1054
1055        let (tx, rx) = mpsc::channel();
1056        thread::spawn(move || {
1057            tx.send(f2()).unwrap();
1058        });
1059        assert!(rx.recv_timeout(Duration::from_millis(500)).is_err());
1060
1061        scenario.teardown();
1062        assert_eq!(rx.recv_timeout(Duration::from_millis(500)).unwrap(), 0);
1063        assert_eq!(f1(), 0);
1064    }
1065}