regex_automata::dfa::regex

Struct Builder

Source
pub struct Builder {}
Expand description

A builder for a regex based on deterministic finite automatons.

This builder permits configuring options for the syntax of a pattern, the NFA construction, the DFA construction and finally the regex searching itself. This builder is different from a general purpose regex builder in that it permits fine grain configuration of the construction process. The trade off for this is complexity, and the possibility of setting a configuration that might not make sense. For example, there are two different UTF-8 modes:

  • syntax::Config::utf8 controls whether the pattern itself can contain sub-expressions that match invalid UTF-8.
  • thompson::Config::utf8 controls how the regex iterators themselves advance the starting position of the next search when a match with zero length is found.

Generally speaking, callers will want to either enable all of these or disable all of these.

Internally, building a regex requires building two DFAs, where one is responsible for finding the end of a match and the other is responsible for finding the start of a match. If you only need to detect whether something matched, or only the end of a match, then you should use a [dense::Builder] to construct a single DFA, which is cheaper than building two DFAs.

§Build methods

This builder has a few “build” methods. In general, it’s the result of combining the following parameters:

  • Building one or many regexes.
  • Building a regex with dense or sparse DFAs.

The simplest “build” method is [Builder::build]. It accepts a single pattern and builds a dense DFA using usize for the state identifier representation.

The most general “build” method is [Builder::build_many], which permits building a regex that searches for multiple patterns simultaneously while using a specific state identifier representation.

The most flexible “build” method, but hardest to use, is Builder::build_from_dfas. This exposes the fact that a Regex is just a pair of DFAs, and this method allows you to specify those DFAs exactly.

§Example

This example shows how to disable UTF-8 mode in the syntax and the regex itself. This is generally what you want for matching on arbitrary bytes.

use regex_automata::{
    dfa::regex::Regex, nfa::thompson, util::syntax, Match,
};

let re = Regex::builder()
    .syntax(syntax::Config::new().utf8(false))
    .thompson(thompson::Config::new().utf8(false))
    .build(r"foo(?-u:[^b])ar.*")?;
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 1..9));
let got = re.find(haystack);
assert_eq!(expected, got);
// Notice that `(?-u:[^b])` matches invalid UTF-8,
// but the subsequent `.*` does not! Disabling UTF-8
// on the syntax permits this.
assert_eq!(b"foo\xFFarzz", &haystack[got.unwrap().range()]);

Implementations§

Source§

impl Builder

Source

pub fn new() -> Builder

Create a new regex builder with the default configuration.

Source

pub fn build_from_dfas<A: Automaton>(&self, forward: A, reverse: A) -> Regex<A>

Build a regex from its component forward and reverse DFAs.

This is useful when deserializing a regex from some arbitrary memory region. This is also useful for building regexes from other types of DFAs.

If you’re building the DFAs from scratch instead of building new DFAs from other DFAs, then you’ll need to make sure that the reverse DFA is configured correctly to match the intended semantics. Namely:

  • It should be anchored.
  • It should use MatchKind::All semantics.
  • It should match in reverse.
  • Otherwise, its configuration should match the forward DFA.

If these conditions aren’t satisfied, then the behavior of searches is unspecified.

Note that when using this constructor, no configuration is applied. Since this routine provides the DFAs to the builder, there is no opportunity to apply other configuration options.

§Example

This example is a bit a contrived. The usual use of these methods would involve serializing initial_re somewhere and then deserializing it later to build a regex. But in this case, we do everything in memory.

use regex_automata::dfa::regex::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let (fwd, rev) = (initial_re.forward(), initial_re.reverse());
let re = Regex::builder().build_from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

This example shows how to build a Regex that uses sparse DFAs instead of dense DFAs without using one of the convenience build_sparse routines:

use regex_automata::dfa::regex::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let fwd = initial_re.forward().to_sparse()?;
let rev = initial_re.reverse().to_sparse()?;
let re = Regex::builder().build_from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

Trait Implementations§

Source§

impl Clone for Builder

Source§

fn clone(&self) -> Builder

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Builder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Builder

Source§

fn default() -> Builder

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.