fail/lib.rs
1// Copyright 2019 TiKV Project Authors. Licensed under Apache-2.0.
2
3//! A fail point implementation for Rust.
4//!
5//! Fail points are code instrumentations that allow errors and other behavior
6//! to be injected dynamically at runtime, primarily for testing purposes. Fail
7//! points are flexible and can be configured to exhibit a variety of behavior,
8//! including panics, early returns, and sleeping. They can be controlled both
9//! programmatically and via the environment, and can be triggered
10//! conditionally and probabilistically.
11//!
12//! This crate is inspired by FreeBSD's
13//! [failpoints](https://freebsd.org/cgi/man.cgi?query=fail).
14//!
15//! ## Usage
16//!
17//! First, add this to your `Cargo.toml`:
18//!
19//! ```toml
20//! [dependencies]
21//! fail = "0.5"
22//! ```
23//!
24//! Now you can import the `fail_point!` macro from the `fail` crate and use it
25//! to inject dynamic failures.
26//!
27//! As an example, here's a simple program that uses a fail point to simulate an
28//! I/O panic:
29//!
30//! ```rust
31//! use fail::{fail_point, FailScenario};
32//!
33//! fn do_fallible_work() {
34//! fail_point!("read-dir");
35//! let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
36//! // ... do some work on the directory ...
37//! }
38//!
39//! let scenario = FailScenario::setup();
40//! do_fallible_work();
41//! scenario.teardown();
42//! println!("done");
43//! ```
44//!
45//! Here, the program calls `unwrap` on the result of `read_dir`, a function
46//! that returns a `Result`. In other words, this particular program expects
47//! this call to `read_dir` to always succeed. And in practice it almost always
48//! will, which makes the behavior of this program when `read_dir` fails
49//! difficult to test. By instrumenting the program with a fail point we can
50//! pretend that `read_dir` failed, causing the subsequent `unwrap` to panic,
51//! and allowing us to observe the program's behavior under failure conditions.
52//!
53//! When the program is run normally it just prints "done":
54//!
55//! ```sh
56//! $ cargo run --features fail/failpoints
57//! Finished dev [unoptimized + debuginfo] target(s) in 0.01s
58//! Running `target/debug/failpointtest`
59//! done
60//! ```
61//!
62//! But now, by setting the `FAILPOINTS` variable we can see what happens if the
63//! `read_dir` fails:
64//!
65//! ```sh
66//! FAILPOINTS=read-dir=panic cargo run --features fail/failpoints
67//! Finished dev [unoptimized + debuginfo] target(s) in 0.01s
68//! Running `target/debug/failpointtest`
69//! thread 'main' panicked at 'failpoint read-dir panic', /home/ubuntu/.cargo/registry/src/github.com-1ecc6299db9ec823/fail-0.2.0/src/lib.rs:286:25
70//! note: Run with `RUST_BACKTRACE=1` for a backtrace.
71//! ```
72//!
73//! ## Usage in tests
74//!
75//! The previous example triggers a fail point by modifying the `FAILPOINT`
76//! environment variable. In practice, you'll often want to trigger fail points
77//! programmatically, in unit tests.
78//! Fail points are global resources, and Rust tests run in parallel,
79//! so tests that exercise fail points generally need to hold a lock to
80//! avoid interfering with each other. This is accomplished by `FailScenario`.
81//!
82//! Here's a basic pattern for writing unit tests tests with fail points:
83//!
84//! ```rust
85//! use fail::{fail_point, FailScenario};
86//!
87//! fn do_fallible_work() {
88//! fail_point!("read-dir");
89//! let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
90//! // ... do some work on the directory ...
91//! }
92//!
93//! #[test]
94//! #[should_panic]
95//! fn test_fallible_work() {
96//! let scenario = FailScenario::setup();
97//! fail::cfg("read-dir", "panic").unwrap();
98//!
99//! do_fallible_work();
100//!
101//! scenario.teardown();
102//! }
103//! ```
104//!
105//! Even if a test does not itself turn on any fail points, code that it runs
106//! could trigger a fail point that was configured by another thread. Because of
107//! this it is a best practice to put all fail point unit tests into their own
108//! binary. Here's an example of a snippet from `Cargo.toml` that creates a
109//! fail-point-specific test binary:
110//!
111//! ```toml
112//! [[test]]
113//! name = "failpoints"
114//! path = "tests/failpoints/mod.rs"
115//! required-features = ["fail/failpoints"]
116//! ```
117//!
118//!
119//! ## Early return
120//!
121//! The previous examples illustrate injecting panics via fail points, but
122//! panics aren't the only — or even the most common — error pattern
123//! in Rust. The more common type of error is propagated by `Result` return
124//! values, and fail points can inject those as well with "early returns". That
125//! is, when configuring a fail point as "return" (as opposed to "panic"), the
126//! fail point will immediately return from the function, optionally with a
127//! configurable value.
128//!
129//! The setup for early return requires a slightly diferent invocation of the
130//! `fail_point!` macro. To illustrate this, let's modify the `do_fallible_work`
131//! function we used earlier to return a `Result`:
132//!
133//! ```rust
134//! use fail::{fail_point, FailScenario};
135//! use std::io;
136//!
137//! fn do_fallible_work() -> io::Result<()> {
138//! fail_point!("read-dir");
139//! let _dir: Vec<_> = std::fs::read_dir(".")?.collect();
140//! // ... do some work on the directory ...
141//! Ok(())
142//! }
143//!
144//! fn main() -> io::Result<()> {
145//! let scenario = FailScenario::setup();
146//! do_fallible_work()?;
147//! scenario.teardown();
148//! println!("done");
149//! Ok(())
150//! }
151//! ```
152//!
153//! This example has more proper Rust error handling, with no unwraps
154//! anywhere. Instead it uses `?` to propagate errors via the `Result` type
155//! return values. This is more realistic Rust code.
156//!
157//! The "read-dir" fail point though is not yet configured to support early
158//! return, so if we attempt to configure it to "return", we'll see an error
159//! like
160//!
161//! ```sh
162//! $ FAILPOINTS=read-dir=return cargo run --features fail/failpoints
163//! Finished dev [unoptimized + debuginfo] target(s) in 0.13s
164//! Running `target/debug/failpointtest`
165//! thread 'main' panicked at 'Return is not supported for the fail point "read-dir"', src/main.rs:7:5
166//! note: Run with `RUST_BACKTRACE=1` for a backtrace.
167//! ```
168//!
169//! This error tells us that the "read-dir" fail point is not defined correctly
170//! to support early return, and gives us the line number of that fail point.
171//! What we're missing in the fail point definition is code describring _how_ to
172//! return an error value, and the way we do this is by passing `fail_point!` a
173//! closure that returns the same type as the enclosing function.
174//!
175//! Here's a variation that does so:
176//!
177//! ```rust
178//! # use std::io;
179//! fn do_fallible_work() -> io::Result<()> {
180//! fail::fail_point!("read-dir", |_| {
181//! Err(io::Error::new(io::ErrorKind::PermissionDenied, "error"))
182//! });
183//! let _dir: Vec<_> = std::fs::read_dir(".")?.collect();
184//! // ... do some work on the directory ...
185//! Ok(())
186//! }
187//! ```
188//!
189//! And now if the "read-dir" fail point is configured to "return" we get a
190//! different result:
191//!
192//! ```sh
193//! $ FAILPOINTS=read-dir=return cargo run --features fail/failpoints
194//! Compiling failpointtest v0.1.0
195//! Finished dev [unoptimized + debuginfo] target(s) in 2.38s
196//! Running `target/debug/failpointtest`
197//! Error: Custom { kind: PermissionDenied, error: StringError("error") }
198//! ```
199//!
200//! This time, `do_fallible_work` returned the error defined in our closure,
201//! which propagated all the way up and out of main.
202//!
203//! ## Advanced usage
204//!
205//! That's the basics of fail points: defining them with `fail_point!`,
206//! configuring them with `FAILPOINTS` and `fail::cfg`, and configuring them to
207//! panic and return early. But that's not all they can do. To learn more see
208//! the documentation for [`cfg`](fn.cfg.html),
209//! [`cfg_callback`](fn.cfg_callback.html) and
210//! [`fail_point!`](macro.fail_point.html).
211//!
212//!
213//! ## Usage considerations
214//!
215//! For most effective fail point usage, keep in mind the following:
216//!
217//! - Fail points are disabled by default and can be enabled via the `failpoints`
218//! feature. When failpoints are disabled, no code is generated by the macro.
219//! - Carefully consider complex, concurrent, non-deterministic combinations of
220//! fail points. Put test cases exercising fail points into their own test
221//! crate.
222//! - Fail points might have the same name, in which case they take the
223//! same actions. Be careful about duplicating fail point names, either within
224//! a single crate, or across multiple crates.
225
226#![deny(missing_docs, missing_debug_implementations)]
227
228use std::collections::HashMap;
229use std::env::VarError;
230use std::fmt::Debug;
231use std::str::FromStr;
232use std::sync::atomic::{AtomicUsize, Ordering};
233use std::sync::{Arc, Condvar, Mutex, MutexGuard, RwLock, TryLockError};
234use std::time::{Duration, Instant};
235use std::{env, thread};
236
237#[derive(Clone)]
238struct SyncCallback(Arc<dyn Fn() + Send + Sync>);
239
240impl Debug for SyncCallback {
241 fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
242 f.write_str("SyncCallback()")
243 }
244}
245
246impl PartialEq for SyncCallback {
247 #[allow(clippy::vtable_address_comparisons)]
248 fn eq(&self, other: &Self) -> bool {
249 Arc::ptr_eq(&self.0, &other.0)
250 }
251}
252
253impl SyncCallback {
254 fn new(f: impl Fn() + Send + Sync + 'static) -> SyncCallback {
255 SyncCallback(Arc::new(f))
256 }
257
258 fn run(&self) {
259 let callback = &self.0;
260 callback();
261 }
262}
263
264/// Supported tasks.
265#[derive(Clone, Debug, PartialEq)]
266enum Task {
267 /// Do nothing.
268 Off,
269 /// Return the value.
270 Return(Option<String>),
271 /// Sleep for some milliseconds.
272 Sleep(u64),
273 /// Panic with the message.
274 Panic(Option<String>),
275 /// Print the message.
276 Print(Option<String>),
277 /// Sleep until other action is set.
278 Pause,
279 /// Yield the CPU.
280 Yield,
281 /// Busy waiting for some milliseconds.
282 Delay(u64),
283 /// Call callback function.
284 Callback(SyncCallback),
285}
286
287#[derive(Debug)]
288struct Action {
289 task: Task,
290 freq: f32,
291 count: Option<AtomicUsize>,
292}
293
294impl PartialEq for Action {
295 fn eq(&self, hs: &Action) -> bool {
296 if self.task != hs.task || self.freq != hs.freq {
297 return false;
298 }
299 if let Some(ref lhs) = self.count {
300 if let Some(ref rhs) = hs.count {
301 return lhs.load(Ordering::Relaxed) == rhs.load(Ordering::Relaxed);
302 }
303 } else if hs.count.is_none() {
304 return true;
305 }
306 false
307 }
308}
309
310impl Action {
311 fn new(task: Task, freq: f32, max_cnt: Option<usize>) -> Action {
312 Action {
313 task,
314 freq,
315 count: max_cnt.map(AtomicUsize::new),
316 }
317 }
318
319 fn from_callback(f: impl Fn() + Send + Sync + 'static) -> Action {
320 let task = Task::Callback(SyncCallback::new(f));
321 Action {
322 task,
323 freq: 1.0,
324 count: None,
325 }
326 }
327
328 fn get_task(&self) -> Option<Task> {
329 use rand::Rng;
330
331 if let Some(ref cnt) = self.count {
332 let c = cnt.load(Ordering::Acquire);
333 if c == 0 {
334 return None;
335 }
336 }
337 if self.freq < 1f32 && !rand::thread_rng().gen_bool(f64::from(self.freq)) {
338 return None;
339 }
340 if let Some(ref ref_cnt) = self.count {
341 let mut cnt = ref_cnt.load(Ordering::Acquire);
342 loop {
343 if cnt == 0 {
344 return None;
345 }
346 let new_cnt = cnt - 1;
347 match ref_cnt.compare_exchange_weak(
348 cnt,
349 new_cnt,
350 Ordering::AcqRel,
351 Ordering::Acquire,
352 ) {
353 Ok(_) => break,
354 Err(c) => cnt = c,
355 }
356 }
357 }
358 Some(self.task.clone())
359 }
360}
361
362fn partition(s: &str, pattern: char) -> (&str, Option<&str>) {
363 let mut splits = s.splitn(2, pattern);
364 (splits.next().unwrap(), splits.next())
365}
366
367impl FromStr for Action {
368 type Err = String;
369
370 /// Parse an action.
371 ///
372 /// `s` should be in the format `[p%][cnt*]task[(args)]`, `p%` is the frequency,
373 /// `cnt` is the max times the action can be triggered.
374 fn from_str(s: &str) -> Result<Action, String> {
375 let mut remain = s.trim();
376 let mut args = None;
377 // in case there is '%' in args, we need to parse it first.
378 let (first, second) = partition(remain, '(');
379 if let Some(second) = second {
380 remain = first;
381 if !second.ends_with(')') {
382 return Err("parentheses do not match".to_owned());
383 }
384 args = Some(&second[..second.len() - 1]);
385 }
386
387 let mut frequency = 1f32;
388 let (first, second) = partition(remain, '%');
389 if let Some(second) = second {
390 remain = second;
391 match first.parse::<f32>() {
392 Err(e) => return Err(format!("failed to parse frequency: {}", e)),
393 Ok(freq) => frequency = freq / 100.0,
394 }
395 }
396
397 let mut max_cnt = None;
398 let (first, second) = partition(remain, '*');
399 if let Some(second) = second {
400 remain = second;
401 match first.parse() {
402 Err(e) => return Err(format!("failed to parse count: {}", e)),
403 Ok(cnt) => max_cnt = Some(cnt),
404 }
405 }
406
407 let parse_timeout = || match args {
408 None => Err("sleep require timeout".to_owned()),
409 Some(timeout_str) => match timeout_str.parse() {
410 Err(e) => Err(format!("failed to parse timeout: {}", e)),
411 Ok(timeout) => Ok(timeout),
412 },
413 };
414
415 let task = match remain {
416 "off" => Task::Off,
417 "return" => Task::Return(args.map(str::to_owned)),
418 "sleep" => Task::Sleep(parse_timeout()?),
419 "panic" => Task::Panic(args.map(str::to_owned)),
420 "print" => Task::Print(args.map(str::to_owned)),
421 "pause" => Task::Pause,
422 "yield" => Task::Yield,
423 "delay" => Task::Delay(parse_timeout()?),
424 _ => return Err(format!("unrecognized command {:?}", remain)),
425 };
426
427 Ok(Action::new(task, frequency, max_cnt))
428 }
429}
430
431#[cfg_attr(feature = "cargo-clippy", allow(clippy::mutex_atomic))]
432#[derive(Debug)]
433struct FailPoint {
434 pause: Mutex<bool>,
435 pause_notifier: Condvar,
436 actions: RwLock<Vec<Action>>,
437 actions_str: RwLock<String>,
438}
439
440#[cfg_attr(feature = "cargo-clippy", allow(clippy::mutex_atomic))]
441impl FailPoint {
442 fn new() -> FailPoint {
443 FailPoint {
444 pause: Mutex::new(false),
445 pause_notifier: Condvar::new(),
446 actions: RwLock::default(),
447 actions_str: RwLock::default(),
448 }
449 }
450
451 fn set_actions(&self, actions_str: &str, actions: Vec<Action>) {
452 loop {
453 // TODO: maybe busy waiting here.
454 match self.actions.try_write() {
455 Err(TryLockError::WouldBlock) => {}
456 Ok(mut guard) => {
457 *guard = actions;
458 *self.actions_str.write().unwrap() = actions_str.to_string();
459 return;
460 }
461 Err(e) => panic!("unexpected poison: {:?}", e),
462 }
463 let mut guard = self.pause.lock().unwrap();
464 *guard = false;
465 self.pause_notifier.notify_all();
466 }
467 }
468
469 #[cfg_attr(feature = "cargo-clippy", allow(clippy::option_option))]
470 fn eval(&self, name: &str) -> Option<Option<String>> {
471 let task = {
472 let actions = self.actions.read().unwrap();
473 match actions.iter().filter_map(Action::get_task).next() {
474 Some(Task::Pause) => {
475 let mut guard = self.pause.lock().unwrap();
476 *guard = true;
477 loop {
478 guard = self.pause_notifier.wait(guard).unwrap();
479 if !*guard {
480 break;
481 }
482 }
483 return None;
484 }
485 Some(t) => t,
486 None => return None,
487 }
488 };
489
490 match task {
491 Task::Off => {}
492 Task::Return(s) => return Some(s),
493 Task::Sleep(t) => thread::sleep(Duration::from_millis(t)),
494 Task::Panic(msg) => match msg {
495 Some(ref msg) => panic!("{}", msg),
496 None => panic!("failpoint {} panic", name),
497 },
498 Task::Print(msg) => match msg {
499 Some(ref msg) => log::info!("{}", msg),
500 None => log::info!("failpoint {} executed.", name),
501 },
502 Task::Pause => unreachable!(),
503 Task::Yield => thread::yield_now(),
504 Task::Delay(t) => {
505 let timer = Instant::now();
506 let timeout = Duration::from_millis(t);
507 while timer.elapsed() < timeout {}
508 }
509 Task::Callback(f) => {
510 f.run();
511 }
512 }
513 None
514 }
515}
516
517/// Registry with failpoints configuration.
518type Registry = HashMap<String, Arc<FailPoint>>;
519
520#[derive(Debug, Default)]
521struct FailPointRegistry {
522 // TODO: remove rwlock or store *mut FailPoint
523 registry: RwLock<Registry>,
524}
525
526use once_cell::sync::Lazy;
527
528static REGISTRY: Lazy<FailPointRegistry> = Lazy::new(FailPointRegistry::default);
529static SCENARIO: Lazy<Mutex<&'static FailPointRegistry>> = Lazy::new(|| Mutex::new(®ISTRY));
530
531/// Test scenario with configured fail points.
532#[derive(Debug)]
533pub struct FailScenario<'a> {
534 scenario_guard: MutexGuard<'a, &'static FailPointRegistry>,
535}
536
537impl<'a> FailScenario<'a> {
538 /// Set up the system for a fail points scenario.
539 ///
540 /// Configures all fail points specified in the `FAILPOINTS` environment variable.
541 /// It does not otherwise change any existing fail point configuration.
542 ///
543 /// The format of `FAILPOINTS` is `failpoint=actions;...`, where
544 /// `failpoint` is the name of the fail point. For more information
545 /// about fail point actions see the [`cfg`](fn.cfg.html) function and
546 /// the [`fail_point`](macro.fail_point.html) macro.
547 ///
548 /// `FAILPOINTS` may configure fail points that are not actually defined. In
549 /// this case the configuration has no effect.
550 ///
551 /// This function should generally be called prior to running a test with fail
552 /// points, and afterward paired with [`teardown`](#method.teardown).
553 ///
554 /// # Panics
555 ///
556 /// Panics if an action is not formatted correctly.
557 pub fn setup() -> Self {
558 // Cleanup first, in case of previous failed/panic'ed test scenarios.
559 let scenario_guard = SCENARIO.lock().unwrap_or_else(|e| e.into_inner());
560 let mut registry = scenario_guard.registry.write().unwrap();
561 Self::cleanup(&mut registry);
562
563 let failpoints = match env::var("FAILPOINTS") {
564 Ok(s) => s,
565 Err(VarError::NotPresent) => return Self { scenario_guard },
566 Err(e) => panic!("invalid failpoints: {:?}", e),
567 };
568 for mut cfg in failpoints.trim().split(';') {
569 cfg = cfg.trim();
570 if cfg.is_empty() {
571 continue;
572 }
573 let (name, order) = partition(cfg, '=');
574 match order {
575 None => panic!("invalid failpoint: {:?}", cfg),
576 Some(order) => {
577 if let Err(e) = set(&mut registry, name.to_owned(), order) {
578 panic!("unable to configure failpoint \"{}\": {}", name, e);
579 }
580 }
581 }
582 }
583 Self { scenario_guard }
584 }
585
586 /// Tear down the fail point system.
587 ///
588 /// Clears the configuration of all fail points. Any paused fail
589 /// points will be notified before they are deactivated.
590 ///
591 /// This function should generally be called after running a test with fail points.
592 /// Calling `teardown` without previously calling `setup` results in a no-op.
593 pub fn teardown(self) {
594 drop(self)
595 }
596
597 /// Clean all registered fail points.
598 fn cleanup(registry: &mut std::sync::RwLockWriteGuard<'a, Registry>) {
599 for p in registry.values() {
600 // wake up all pause failpoint.
601 p.set_actions("", vec![]);
602 }
603 registry.clear();
604 }
605}
606
607impl<'a> Drop for FailScenario<'a> {
608 fn drop(&mut self) {
609 let mut registry = self.scenario_guard.registry.write().unwrap();
610 Self::cleanup(&mut registry)
611 }
612}
613
614/// Returns whether code generation for failpoints is enabled.
615///
616/// This function allows consumers to check (at runtime) whether the library
617/// was compiled with the (buildtime) `failpoints` feature, which enables
618/// code generation for failpoints.
619pub const fn has_failpoints() -> bool {
620 cfg!(feature = "failpoints")
621}
622
623/// Get all registered fail points.
624///
625/// Return a vector of `(name, actions)` pairs.
626pub fn list() -> Vec<(String, String)> {
627 let registry = REGISTRY.registry.read().unwrap();
628 registry
629 .iter()
630 .map(|(name, fp)| (name.to_string(), fp.actions_str.read().unwrap().clone()))
631 .collect()
632}
633
634#[doc(hidden)]
635pub fn eval<R, F: FnOnce(Option<String>) -> R>(name: &str, f: F) -> Option<R> {
636 let p = {
637 let registry = REGISTRY.registry.read().unwrap();
638 match registry.get(name) {
639 None => return None,
640 Some(p) => p.clone(),
641 }
642 };
643 p.eval(name).map(f)
644}
645
646/// Configure the actions for a fail point at runtime.
647///
648/// Each fail point can be configured with a series of actions, specified by the
649/// `actions` argument. The format of `actions` is `action[->action...]`. When
650/// multiple actions are specified, an action will be checked only when its
651/// former action is not triggered.
652///
653/// The format of a single action is `[p%][cnt*]task[(arg)]`. `p%` is the
654/// expected probability that the action is triggered, and `cnt*` is the max
655/// times the action can be triggered. The supported values of `task` are:
656///
657/// - `off`, the fail point will do nothing.
658/// - `return(arg)`, return early when the fail point is triggered. `arg` is passed to `$e` (
659/// defined via the `fail_point!` macro) as a string.
660/// - `sleep(milliseconds)`, sleep for the specified time.
661/// - `panic(msg)`, panic with the message.
662/// - `print(msg)`, log the message, using the `log` crate, at the `info` level.
663/// - `pause`, sleep until other action is set to the fail point.
664/// - `yield`, yield the CPU.
665/// - `delay(milliseconds)`, busy waiting for the specified time.
666///
667/// For example, `20%3*print(still alive!)->panic` means the fail point has 20% chance to print a
668/// message "still alive!" and 80% chance to panic. And the message will be printed at most 3
669/// times.
670///
671/// The `FAILPOINTS` environment variable accepts this same syntax for its fail
672/// point actions.
673///
674/// A call to `cfg` with a particular fail point name overwrites any existing actions for
675/// that fail point, including those set via the `FAILPOINTS` environment variable.
676pub fn cfg<S: Into<String>>(name: S, actions: &str) -> Result<(), String> {
677 let mut registry = REGISTRY.registry.write().unwrap();
678 set(&mut registry, name.into(), actions)
679}
680
681/// Configure the actions for a fail point at runtime.
682///
683/// Each fail point can be configured by a callback. Process will call this callback function
684/// when it meet this fail-point.
685pub fn cfg_callback<S, F>(name: S, f: F) -> Result<(), String>
686where
687 S: Into<String>,
688 F: Fn() + Send + Sync + 'static,
689{
690 let mut registry = REGISTRY.registry.write().unwrap();
691 let p = registry
692 .entry(name.into())
693 .or_insert_with(|| Arc::new(FailPoint::new()));
694 let action = Action::from_callback(f);
695 let actions = vec![action];
696 p.set_actions("callback", actions);
697 Ok(())
698}
699
700/// Remove a fail point.
701///
702/// If the fail point doesn't exist, nothing will happen.
703pub fn remove<S: AsRef<str>>(name: S) {
704 let mut registry = REGISTRY.registry.write().unwrap();
705 if let Some(p) = registry.remove(name.as_ref()) {
706 // wake up all pause failpoint.
707 p.set_actions("", vec![]);
708 }
709}
710
711/// Configure fail point in RAII style.
712#[derive(Debug)]
713pub struct FailGuard(String);
714
715impl Drop for FailGuard {
716 fn drop(&mut self) {
717 remove(&self.0);
718 }
719}
720
721impl FailGuard {
722 /// Configure the actions for a fail point during the lifetime of the returning `FailGuard`.
723 ///
724 /// Read documentation of [`cfg`] for more details.
725 pub fn new<S: Into<String>>(name: S, actions: &str) -> Result<FailGuard, String> {
726 let name = name.into();
727 cfg(&name, actions)?;
728 Ok(FailGuard(name))
729 }
730
731 /// Configure the actions for a fail point during the lifetime of the returning `FailGuard`.
732 ///
733 /// Read documentation of [`cfg_callback`] for more details.
734 pub fn with_callback<S, F>(name: S, f: F) -> Result<FailGuard, String>
735 where
736 S: Into<String>,
737 F: Fn() + Send + Sync + 'static,
738 {
739 let name = name.into();
740 cfg_callback(&name, f)?;
741 Ok(FailGuard(name))
742 }
743}
744
745fn set(
746 registry: &mut HashMap<String, Arc<FailPoint>>,
747 name: String,
748 actions: &str,
749) -> Result<(), String> {
750 let actions_str = actions;
751 // `actions` are in the format of `failpoint[->failpoint...]`.
752 let actions = actions
753 .split("->")
754 .map(Action::from_str)
755 .collect::<Result<_, _>>()?;
756 // Please note that we can't figure out whether there is a failpoint named `name`,
757 // so we may insert a failpoint that doesn't exist at all.
758 let p = registry
759 .entry(name)
760 .or_insert_with(|| Arc::new(FailPoint::new()));
761 p.set_actions(actions_str, actions);
762 Ok(())
763}
764
765/// Define a fail point (requires `failpoints` feature).
766///
767/// The `fail_point!` macro has three forms, and they all take a name as the
768/// first argument. The simplest form takes only a name and is suitable for
769/// executing most fail point behavior, including panicking, but not for early
770/// return or conditional execution based on a local flag.
771///
772/// The three forms of fail points look as follows.
773///
774/// 1. A basic fail point:
775///
776/// ```rust
777/// # #[macro_use] extern crate fail;
778/// fn function_return_unit() {
779/// fail_point!("fail-point-1");
780/// }
781/// ```
782///
783/// This form of fail point can be configured to panic, print, sleep, pause, etc., but
784/// not to return from the function early.
785///
786/// 2. A fail point that may return early:
787///
788/// ```rust
789/// # #[macro_use] extern crate fail;
790/// fn function_return_value() -> u64 {
791/// fail_point!("fail-point-2", |r| r.map_or(2, |e| e.parse().unwrap()));
792/// 0
793/// }
794/// ```
795///
796/// This form of fail point can additionally be configured to return early from
797/// the enclosing function. It accepts a closure, which itself accepts an
798/// `Option<String>`, and is expected to transform that argument into the early
799/// return value. The argument string is sourced from the fail point
800/// configuration string. For example configuring this "fail-point-2" as
801/// "return(100)" will execute the fail point closure, passing it a `Some` value
802/// containing a `String` equal to "100"; the closure then parses it into the
803/// return value.
804///
805/// 3. A fail point with conditional execution:
806///
807/// ```rust
808/// # #[macro_use] extern crate fail;
809/// fn function_conditional(enable: bool) {
810/// fail_point!("fail-point-3", enable, |_| {});
811/// }
812/// ```
813///
814/// In this final form, the second argument is a local boolean expression that
815/// must evaluate to `true` before the fail point is evaluated. The third
816/// argument is again an early-return closure.
817///
818/// The three macro arguments (or "designators") are called `$name`, `$cond`,
819/// and `$e`. `$name` must be `&str`, `$cond` must be a boolean expression,
820/// and`$e` must be a function or closure that accepts an `Option<String>` and
821/// returns the same type as the enclosing function.
822///
823/// For more examples see the [crate documentation](index.html). For more
824/// information about controlling fail points see the [`cfg`](fn.cfg.html)
825/// function.
826#[macro_export]
827#[cfg(feature = "failpoints")]
828macro_rules! fail_point {
829 ($name:expr) => {{
830 $crate::eval($name, |_| {
831 panic!("Return is not supported for the fail point \"{}\"", $name);
832 });
833 }};
834 ($name:expr, $e:expr) => {{
835 if let Some(res) = $crate::eval($name, $e) {
836 return res;
837 }
838 }};
839 ($name:expr, $cond:expr, $e:expr) => {{
840 if $cond {
841 $crate::fail_point!($name, $e);
842 }
843 }};
844}
845
846/// Define a fail point (disabled, see `failpoints` feature).
847#[macro_export]
848#[cfg(not(feature = "failpoints"))]
849macro_rules! fail_point {
850 ($name:expr, $e:expr) => {{}};
851 ($name:expr) => {{}};
852 ($name:expr, $cond:expr, $e:expr) => {{}};
853}
854
855#[cfg(test)]
856mod tests {
857 use super::*;
858
859 use std::sync::*;
860
861 #[test]
862 fn test_has_failpoints() {
863 assert_eq!(cfg!(feature = "failpoints"), has_failpoints());
864 }
865
866 #[test]
867 fn test_off() {
868 let point = FailPoint::new();
869 point.set_actions("", vec![Action::new(Task::Off, 1.0, None)]);
870 assert!(point.eval("test_fail_point_off").is_none());
871 }
872
873 #[test]
874 fn test_return() {
875 let point = FailPoint::new();
876 point.set_actions("", vec![Action::new(Task::Return(None), 1.0, None)]);
877 let res = point.eval("test_fail_point_return");
878 assert_eq!(res, Some(None));
879
880 let ret = Some("test".to_owned());
881 point.set_actions("", vec![Action::new(Task::Return(ret.clone()), 1.0, None)]);
882 let res = point.eval("test_fail_point_return");
883 assert_eq!(res, Some(ret));
884 }
885
886 #[test]
887 fn test_sleep() {
888 let point = FailPoint::new();
889 let timer = Instant::now();
890 point.set_actions("", vec![Action::new(Task::Sleep(1000), 1.0, None)]);
891 assert!(point.eval("test_fail_point_sleep").is_none());
892 assert!(timer.elapsed() > Duration::from_millis(1000));
893 }
894
895 #[should_panic]
896 #[test]
897 fn test_panic() {
898 let point = FailPoint::new();
899 point.set_actions("", vec![Action::new(Task::Panic(None), 1.0, None)]);
900 point.eval("test_fail_point_panic");
901 }
902
903 #[test]
904 fn test_print() {
905 struct LogCollector(Arc<Mutex<Vec<String>>>);
906 impl log::Log for LogCollector {
907 fn enabled(&self, _: &log::Metadata) -> bool {
908 true
909 }
910 fn log(&self, record: &log::Record) {
911 let mut buf = self.0.lock().unwrap();
912 buf.push(format!("{}", record.args()));
913 }
914 fn flush(&self) {}
915 }
916
917 let buffer = Arc::new(Mutex::new(vec![]));
918 let collector = LogCollector(buffer.clone());
919 log::set_max_level(log::LevelFilter::Info);
920 log::set_boxed_logger(Box::new(collector)).unwrap();
921
922 let point = FailPoint::new();
923 point.set_actions("", vec![Action::new(Task::Print(None), 1.0, None)]);
924 assert!(point.eval("test_fail_point_print").is_none());
925 let msg = buffer.lock().unwrap().pop().unwrap();
926 assert_eq!(msg, "failpoint test_fail_point_print executed.");
927 }
928
929 #[test]
930 fn test_pause() {
931 let point = Arc::new(FailPoint::new());
932 point.set_actions("", vec![Action::new(Task::Pause, 1.0, None)]);
933 let p = point.clone();
934 let (tx, rx) = mpsc::channel();
935 thread::spawn(move || {
936 assert_eq!(p.eval("test_fail_point_pause"), None);
937 tx.send(()).unwrap();
938 });
939 assert!(rx.recv_timeout(Duration::from_secs(1)).is_err());
940 point.set_actions("", vec![Action::new(Task::Off, 1.0, None)]);
941 rx.recv_timeout(Duration::from_secs(1)).unwrap();
942 }
943
944 #[test]
945 fn test_yield() {
946 let point = FailPoint::new();
947 point.set_actions("", vec![Action::new(Task::Yield, 1.0, None)]);
948 assert!(point.eval("test_fail_point_yield").is_none());
949 }
950
951 #[test]
952 fn test_delay() {
953 let point = FailPoint::new();
954 let timer = Instant::now();
955 point.set_actions("", vec![Action::new(Task::Delay(1000), 1.0, None)]);
956 assert!(point.eval("test_fail_point_delay").is_none());
957 assert!(timer.elapsed() > Duration::from_millis(1000));
958 }
959
960 #[test]
961 fn test_frequency_and_count() {
962 let point = FailPoint::new();
963 point.set_actions("", vec![Action::new(Task::Return(None), 0.8, Some(100))]);
964 let mut count = 0;
965 let mut times = 0f64;
966 while count < 100 {
967 if point.eval("test_fail_point_frequency").is_some() {
968 count += 1;
969 }
970 times += 1f64;
971 }
972 assert!(100.0 / 0.9 < times && times < 100.0 / 0.7, "{}", times);
973 for _ in 0..times as u64 {
974 assert!(point.eval("test_fail_point_frequency").is_none());
975 }
976 }
977
978 #[test]
979 fn test_parse() {
980 let cases = vec![
981 ("return", Action::new(Task::Return(None), 1.0, None)),
982 (
983 "return(64)",
984 Action::new(Task::Return(Some("64".to_owned())), 1.0, None),
985 ),
986 ("5*return", Action::new(Task::Return(None), 1.0, Some(5))),
987 ("25%return", Action::new(Task::Return(None), 0.25, None)),
988 (
989 "125%2*return",
990 Action::new(Task::Return(None), 1.25, Some(2)),
991 ),
992 (
993 "return(2%5)",
994 Action::new(Task::Return(Some("2%5".to_owned())), 1.0, None),
995 ),
996 ("125%2*off", Action::new(Task::Off, 1.25, Some(2))),
997 (
998 "125%2*sleep(100)",
999 Action::new(Task::Sleep(100), 1.25, Some(2)),
1000 ),
1001 (" 125%2*off ", Action::new(Task::Off, 1.25, Some(2))),
1002 ("125%2*panic", Action::new(Task::Panic(None), 1.25, Some(2))),
1003 (
1004 "125%2*panic(msg)",
1005 Action::new(Task::Panic(Some("msg".to_owned())), 1.25, Some(2)),
1006 ),
1007 ("125%2*print", Action::new(Task::Print(None), 1.25, Some(2))),
1008 (
1009 "125%2*print(msg)",
1010 Action::new(Task::Print(Some("msg".to_owned())), 1.25, Some(2)),
1011 ),
1012 ("125%2*pause", Action::new(Task::Pause, 1.25, Some(2))),
1013 ("125%2*yield", Action::new(Task::Yield, 1.25, Some(2))),
1014 ("125%2*delay(2)", Action::new(Task::Delay(2), 1.25, Some(2))),
1015 ];
1016 for (expr, exp) in cases {
1017 let res: Action = expr.parse().unwrap();
1018 assert_eq!(res, exp);
1019 }
1020
1021 let fail_cases = vec![
1022 "delay",
1023 "sleep",
1024 "Return",
1025 "ab%return",
1026 "ab*return",
1027 "return(msg",
1028 "unknown",
1029 ];
1030 for case in fail_cases {
1031 assert!(case.parse::<Action>().is_err());
1032 }
1033 }
1034
1035 // This case should be tested as integration case, but when calling `teardown` other cases
1036 // like `test_pause` maybe also affected, so it's better keep it here.
1037 #[test]
1038 #[cfg_attr(not(feature = "failpoints"), ignore)]
1039 fn test_setup_and_teardown() {
1040 let f1 = || {
1041 fail_point!("setup_and_teardown1", |_| 1);
1042 0
1043 };
1044 let f2 = || {
1045 fail_point!("setup_and_teardown2", |_| 2);
1046 0
1047 };
1048 env::set_var(
1049 "FAILPOINTS",
1050 "setup_and_teardown1=return;setup_and_teardown2=pause;",
1051 );
1052 let scenario = FailScenario::setup();
1053 assert_eq!(f1(), 1);
1054
1055 let (tx, rx) = mpsc::channel();
1056 thread::spawn(move || {
1057 tx.send(f2()).unwrap();
1058 });
1059 assert!(rx.recv_timeout(Duration::from_millis(500)).is_err());
1060
1061 scenario.teardown();
1062 assert_eq!(rx.recv_timeout(Duration::from_millis(500)).unwrap(), 0);
1063 assert_eq!(f1(), 0);
1064 }
1065}