Struct mz_repr::adt::regex::Regex

source ·
pub struct Regex {
    pub pattern: String,
    pub case_insensitive: bool,
    pub dot_matches_new_line: bool,
    pub regex: Regex,
}
Expand description

A hashable, comparable, and serializable regular expression type.

The regex::Regex type, the de facto standard regex type in Rust, does not implement PartialOrd, Ord PartialEq, Eq, or Hash. The omissions are reasonable. There is no natural definition of ordering for regexes. There is a natural definition of equality—whether two regexes describe the same regular language—but that is an expensive property to compute, and PartialEq is generally expected to be fast to compute.

This type wraps regex::Regex and imbues it with implementations of the above traits. Two regexes are considered equal iff their string representation is identical, plus flags, such as case_insensitive, are identical. The PartialOrd, Ord, and Hash implementations are similarly based upon the string representation plus flags. As mentioned above, this is not the natural equivalence relation for regexes: for example, the regexes aa* and a+ define the same language, but would not compare as equal with this implementation of PartialEq. Still, it is often useful to have some equivalence relation available (e.g., to store types containing regexes in a hashmap) even if the equivalence relation is imperfect.

regex::Regex is hard to serialize (because of the compiled code), so our approach is to instead serialize this wrapper struct, where we skip serializing the actual regex field, and we reconstruct the regex field from the other fields upon deserialization. (Earlier, serialization was buggy due to https://github.com/tailhook/serde-regex/issues/14, and also making the same mistake in our own protobuf serialization code.)

Fields§

§pattern: String§case_insensitive: bool§dot_matches_new_line: bool§regex: Regex

Implementations§

source§

impl Regex

source

pub fn new(pattern: String, case_insensitive: bool) -> Result<Regex, Error>

A simple constructor for the default setting of dot_matches_new_line: true. See https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES “newline-sensitive matching”

source

pub fn new_dot_matches_new_line( pattern: String, case_insensitive: bool, dot_matches_new_line: bool ) -> Result<Regex, Error>

Allows explicitly setting dot_matches_new_line.

Methods from Deref<Target = Regex>§

source

pub fn is_match(&self, text: &str) -> bool

Returns true if and only if there is a match for the regex in the string given.

It is recommended to use this method if all you need to do is test a match, since the underlying matching engine may be able to do less work.

§Example

Test if some text contains at least one word with exactly 13 Unicode word characters:

let text = "I categorically deny having triskaidekaphobia.";
assert!(Regex::new(r"\b\w{13}\b").unwrap().is_match(text));
source

pub fn find<'t>(&self, text: &'t str) -> Option<Match<'t>>

Returns the start and end byte range of the leftmost-first match in text. If no match exists, then None is returned.

Note that this should only be used if you want to discover the position of the match. Testing the existence of a match is faster if you use is_match.

§Example

Find the start and end location of the first word with exactly 13 Unicode word characters:

let text = "I categorically deny having triskaidekaphobia.";
let mat = Regex::new(r"\b\w{13}\b").unwrap().find(text).unwrap();
assert_eq!(mat.start(), 2);
assert_eq!(mat.end(), 15);
source

pub fn find_iter<'r, 't>(&'r self, text: &'t str) -> Matches<'r, 't>

Returns an iterator for each successive non-overlapping match in text, returning the start and end byte indices with respect to text.

§Example

Find the start and end location of every word with exactly 13 Unicode word characters:

let text = "Retroactively relinquishing remunerations is reprehensible.";
for mat in Regex::new(r"\b\w{13}\b").unwrap().find_iter(text) {
    println!("{:?}", mat);
}
source

pub fn captures<'t>(&self, text: &'t str) -> Option<Captures<'t>>

Returns the capture groups corresponding to the leftmost-first match in text. Capture group 0 always corresponds to the entire match. If no match is found, then None is returned.

You should only use captures if you need access to the location of capturing group matches. Otherwise, find is faster for discovering the location of the overall match.

§Examples

Say you have some text with movie names and their release years, like “‘Citizen Kane’ (1941)”. It’d be nice if we could search for text looking like that, while also extracting the movie name and its release year separately.

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let text = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.get(1).unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.get(2).unwrap().as_str(), "1941");
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
// You can also access the groups by index using the Index notation.
// Note that this will panic on an invalid index.
assert_eq!(&caps[1], "Citizen Kane");
assert_eq!(&caps[2], "1941");
assert_eq!(&caps[0], "'Citizen Kane' (1941)");

Note that the full match is at capture group 0. Each subsequent capture group is indexed by the order of its opening (.

We can make this example a bit clearer by using named capture groups:

let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
               .unwrap();
let text = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.name("title").unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.name("year").unwrap().as_str(), "1941");
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
// You can also access the groups by name using the Index notation.
// Note that this will panic on an invalid group name.
assert_eq!(&caps["title"], "Citizen Kane");
assert_eq!(&caps["year"], "1941");
assert_eq!(&caps[0], "'Citizen Kane' (1941)");

Here we name the capture groups, which we can access with the name method or the Index notation with a &str. Note that the named capture groups are still accessible with get or the Index notation with a usize.

The 0th capture group is always unnamed, so it must always be accessed with get(0) or [0].

source

pub fn captures_iter<'r, 't>(&'r self, text: &'t str) -> CaptureMatches<'r, 't>

Returns an iterator over all the non-overlapping capture groups matched in text. This is operationally the same as find_iter, except it yields information about capturing group matches.

§Example

We can use this to find all movie titles and their release years in some text, where the movie is formatted like “‘Title’ (xxxx)”:

let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
               .unwrap();
let text = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
for caps in re.captures_iter(text) {
    println!("Movie: {:?}, Released: {:?}",
             &caps["title"], &caps["year"]);
}
// Output:
// Movie: Citizen Kane, Released: 1941
// Movie: The Wizard of Oz, Released: 1939
// Movie: M, Released: 1931
source

pub fn split<'r, 't>(&'r self, text: &'t str) -> Split<'r, 't>

Returns an iterator of substrings of text delimited by a match of the regular expression. Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression.

This method will not copy the text given.

§Example

To split a string delimited by arbitrary amounts of spaces or tabs:

let re = Regex::new(r"[ \t]+").unwrap();
let fields: Vec<&str> = re.split("a b \t  c\td    e").collect();
assert_eq!(fields, vec!["a", "b", "c", "d", "e"]);
source

pub fn splitn<'r, 't>(&'r self, text: &'t str, limit: usize) -> SplitN<'r, 't>

Returns an iterator of at most limit substrings of text delimited by a match of the regular expression. (A limit of 0 will return no substrings.) Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression. The remainder of the string that is not split will be the last element in the iterator.

This method will not copy the text given.

§Example

Get the first two words in some text:

let re = Regex::new(r"\W+").unwrap();
let fields: Vec<&str> = re.splitn("Hey! How are you?", 3).collect();
assert_eq!(fields, vec!("Hey", "How", "are you?"));
source

pub fn replace<R, 't>(&self, text: &'t str, rep: R) -> Cow<'t, str>
where R: Replacer,

Replaces the leftmost-first match with the replacement provided. The replacement can be a regular string (where $N and $name are expanded to match capture groups) or a function that takes the matches’ Captures and returns the replaced string.

If no match is found, then a copy of the string is returned unchanged.

§Replacement string syntax

All instances of $name in the replacement text is replaced with the corresponding capture group name.

name may be an integer corresponding to the index of the capture group (counted by order of opening parenthesis where 0 is the entire match) or it can be a name (consisting of letters, digits or underscores) corresponding to a named capture group.

If name isn’t a valid capture group (whether the name doesn’t exist or isn’t a valid index), then it is replaced with the empty string.

The longest possible name is used. e.g., $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.

To write a literal $ use $$.

§Examples

Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:

let re = Regex::new("[^01]+").unwrap();
assert_eq!(re.replace("1078910", ""), "1010");

But anything satisfying the Replacer trait will work. For example, a closure of type |&Captures| -> String provides direct access to the captures corresponding to a match. This allows one to access capturing group matches easily:

let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", |caps: &Captures| {
    format!("{} {}", &caps[2], &caps[1])
});
assert_eq!(result, "Bruce Springsteen");

But this is a bit cumbersome to use all the time. Instead, a simple syntax is supported that expands $name into the corresponding capture group. Here’s the last example, but using this expansion technique with named capture groups:

let re = Regex::new(r"(?P<last>[^,\s]+),\s+(?P<first>\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", "$first $last");
assert_eq!(result, "Bruce Springsteen");

Note that using $2 instead of $first or $1 instead of $last would produce the same result. To write a literal $ use $$.

Sometimes the replacement string requires use of curly braces to delineate a capture group replacement and surrounding literal text. For example, if we wanted to join two words together with an underscore:

let re = Regex::new(r"(?P<first>\w+)\s+(?P<second>\w+)").unwrap();
let result = re.replace("deep fried", "${first}_$second");
assert_eq!(result, "deep_fried");

Without the curly braces, the capture group name first_ would be used, and since it doesn’t exist, it would be replaced with the empty string.

Finally, sometimes you just want to replace a literal string with no regard for capturing group expansion. This can be done by wrapping a byte string with NoExpand:

use regex::NoExpand;

let re = Regex::new(r"(?P<last>[^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", NoExpand("$2 $last"));
assert_eq!(result, "$2 $last");
source

pub fn replace_all<R, 't>(&self, text: &'t str, rep: R) -> Cow<'t, str>
where R: Replacer,

Replaces all non-overlapping matches in text with the replacement provided. This is the same as calling replacen with limit set to 0.

See the documentation for replace for details on how to access capturing group matches in the replacement string.

source

pub fn replacen<R, 't>( &self, text: &'t str, limit: usize, rep: R ) -> Cow<'t, str>
where R: Replacer,

Replaces at most limit non-overlapping matches in text with the replacement provided. If limit is 0, then all non-overlapping matches are replaced.

See the documentation for replace for details on how to access capturing group matches in the replacement string.

source

pub fn shortest_match(&self, text: &str) -> Option<usize>

Returns the end location of a match in the text given.

This method may have the same performance characteristics as is_match, except it provides an end location for a match. In particular, the location returned may be shorter than the proper end of the leftmost-first match.

§Example

Typically, a+ would match the entire first sequence of a in some text, but shortest_match can give up as soon as it sees the first a.

let text = "aaaaa";
let pos = Regex::new(r"a+").unwrap().shortest_match(text);
assert_eq!(pos, Some(1));
source

pub fn shortest_match_at(&self, text: &str, start: usize) -> Option<usize>

Returns the same as shortest_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

source

pub fn is_match_at(&self, text: &str, start: usize) -> bool

Returns the same as is_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

source

pub fn find_at<'t>(&self, text: &'t str, start: usize) -> Option<Match<'t>>

Returns the same as find, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

source

pub fn captures_read<'t>( &self, locs: &mut CaptureLocations, text: &'t str ) -> Option<Match<'t>>

This is like captures, but uses CaptureLocations instead of Captures in order to amortize allocations.

To create a CaptureLocations value, use the Regex::capture_locations method.

This returns the overall match if this was successful, which is always equivalence to the 0th capture group.

source

pub fn captures_read_at<'t>( &self, locs: &mut CaptureLocations, text: &'t str, start: usize ) -> Option<Match<'t>>

Returns the same as captures, but starts the search at the given offset and populates the capture locations given.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

source

pub fn as_str(&self) -> &str

Returns the original string of this regex.

source

pub fn capture_names(&self) -> CaptureNames<'_>

Returns an iterator over the capture names.

source

pub fn captures_len(&self) -> usize

Returns the number of captures.

source

pub fn capture_locations(&self) -> CaptureLocations

Returns an empty set of capture locations that can be reused in multiple calls to captures_read or captures_read_at.

Trait Implementations§

source§

impl Clone for Regex

source§

fn clone(&self) -> Regex

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Regex

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Deref for Regex

§

type Target = Regex

The resulting type after dereferencing.
source§

fn deref(&self) -> &Regex

Dereferences the value.
source§

impl<'de> Deserialize<'de> for Regex

source§

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
source§

impl Hash for Regex

source§

fn hash<H: Hasher>(&self, hasher: &mut H)

Feeds this value into the given Hasher. Read more
1.3.0 · source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
source§

impl MzReflect for Regex

source§

fn add_to_reflected_type_info(rti: &mut ReflectedTypeInfo)

Adds names and types of the fields of the struct or enum to rti. Read more
source§

impl Ord for Regex

source§

fn cmp(&self, other: &Regex) -> Ordering

This method returns an Ordering between self and other. Read more
1.21.0 · source§

fn max(self, other: Self) -> Self
where Self: Sized,

Compares and returns the maximum of two values. Read more
1.21.0 · source§

fn min(self, other: Self) -> Self
where Self: Sized,

Compares and returns the minimum of two values. Read more
1.50.0 · source§

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized + PartialOrd,

Restrict a value to a certain interval. Read more
source§

impl PartialEq for Regex

source§

fn eq(&self, other: &Regex) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl PartialOrd for Regex

source§

fn partial_cmp(&self, other: &Regex) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
1.0.0 · source§

fn lt(&self, other: &Rhs) -> bool

This method tests less than (for self and other) and is used by the < operator. Read more
1.0.0 · source§

fn le(&self, other: &Rhs) -> bool

This method tests less than or equal to (for self and other) and is used by the <= operator. Read more
1.0.0 · source§

fn gt(&self, other: &Rhs) -> bool

This method tests greater than (for self and other) and is used by the > operator. Read more
1.0.0 · source§

fn ge(&self, other: &Rhs) -> bool

This method tests greater than or equal to (for self and other) and is used by the >= operator. Read more
source§

impl RustType<ProtoRegex> for Regex

source§

fn into_proto(&self) -> ProtoRegex

Convert a Self into a Proto value.
source§

fn from_proto(proto: ProtoRegex) -> Result<Self, TryFromProtoError>

Consume and convert a Proto back into a Self value. Read more
source§

impl Serialize for Regex

source§

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

Serialize this value into the given Serde serializer. Read more
source§

impl Eq for Regex

Auto Trait Implementations§

§

impl RefUnwindSafe for Regex

§

impl Send for Regex

§

impl Sync for Regex

§

impl Unpin for Regex

§

impl UnwindSafe for Regex

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T, U> CastInto<U> for T
where U: CastFrom<T>,

source§

fn cast_into(self) -> U

Performs the cast.
source§

impl<Q, K> Comparable<K> for Q
where Q: Ord + ?Sized, K: Borrow<Q> + ?Sized,

source§

fn compare(&self, key: &K) -> Ordering

Compare self to key and return their ordering.
source§

impl<T> DynClone for T
where T: Clone,

source§

fn __clone_box(&self, _: Private) -> *mut ()

source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T> FromRef<T> for T
where T: Clone,

source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
source§

impl<T> FutureExt for T

source§

fn with_context(self, otel_cx: Context) -> WithContext<Self>

Attaches the provided Context to this type, returning a WithContext wrapper. Read more
source§

fn with_current_context(self) -> WithContext<Self>

Attaches the current Context to this type, returning a WithContext wrapper. Read more
source§

impl<T> Hashable for T
where T: Hash,

§

type Output = u64

The type of the output value.
source§

fn hashed(&self) -> u64

A well-distributed integer derived from the data.
source§

impl<T> Instrument for T

source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoRequest<T> for T

source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
source§

impl<T, U> OverrideFrom<Option<&T>> for U
where U: OverrideFrom<T>,

source§

fn override_from(self, layer: &Option<&T>) -> U

Override the configuration represented by Self with values from the given layer.
source§

impl<T> Pointable for T

source§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> PreferredContainer for T
where T: Ord + Clone + 'static,

§

type Container = Vec<T>

The preferred container for the type.
source§

impl<T> ProgressEventTimestamp for T
where T: Data + Debug + Any,

source§

fn as_any(&self) -> &(dyn Any + 'static)

Upcasts this ProgressEventTimestamp to Any. Read more
source§

fn type_name(&self) -> &'static str

Returns the name of the concrete type of this object. Read more
source§

impl<P, R> ProtoType<R> for P
where R: RustType<P>,

source§

impl<T> PushInto<Vec<T>> for T

source§

fn push_into(self, target: &mut Vec<T>)

Push self into the target container.
source§

impl<T> Same for T

§

type Output = T

Should always be Self
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

source§

fn vzip(self) -> V

source§

impl<T> WithSubscriber for T

source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
source§

impl<T> Data for T
where T: Clone + 'static,

source§

impl<T> Data for T
where T: Send + Sync + Any + Serialize + for<'a> Deserialize<'a> + 'static,

source§

impl<T> Data for T
where T: Data + Ord + Debug,

source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

source§

impl<T> ExchangeData for T
where T: Data + Data,

source§

impl<T> ExchangeData for T
where T: ExchangeData + Ord + Debug,

source§

impl<T> Sequence for T
where T: Eq + Hash,