pub struct Regex {
pub pattern: String,
pub case_insensitive: bool,
pub dot_matches_new_line: bool,
pub regex: Regex,
}
Expand description
A hashable, comparable, and serializable regular expression type.
The regex::Regex
type, the de facto standard regex type in Rust, does
not implement PartialOrd
, Ord
PartialEq
, Eq
, or Hash
.
The omissions are reasonable. There is no natural definition of ordering for
regexes. There is a natural definition of equality—whether two regexes
describe the same regular language—but that is an expensive property to
compute, and PartialEq
is generally expected to be fast to compute.
This type wraps regex::Regex
and imbues it with implementations of the
above traits. Two regexes are considered equal iff their string
representation is identical, plus flags, such as case_insensitive
,
are identical. The PartialOrd
, Ord
, and Hash
implementations
are similarly based upon the string representation plus flags. As
mentioned above, this is not the natural equivalence relation for regexes: for
example, the regexes aa*
and a+
define the same language, but would not
compare as equal with this implementation of PartialEq
. Still, it is
often useful to have some equivalence relation available (e.g., to store
types containing regexes in a hashmap) even if the equivalence relation is
imperfect.
regex::Regex is hard to serialize (because of the compiled code), so our approach is to instead serialize this wrapper struct, where we skip serializing the actual regex field, and we reconstruct the regex field from the other fields upon deserialization. (Earlier, serialization was buggy due to https://github.com/tailhook/serde-regex/issues/14, and also making the same mistake in our own protobuf serialization code.)
Fields§
§pattern: String
§case_insensitive: bool
§dot_matches_new_line: bool
§regex: Regex
Implementations§
Methods from Deref<Target = Regex>§
sourcepub fn is_match(&self, text: &str) -> bool
pub fn is_match(&self, text: &str) -> bool
Returns true if and only if there is a match for the regex in the string given.
It is recommended to use this method if all you need to do is test a match, since the underlying matching engine may be able to do less work.
§Example
Test if some text contains at least one word with exactly 13 Unicode word characters:
let text = "I categorically deny having triskaidekaphobia.";
assert!(Regex::new(r"\b\w{13}\b").unwrap().is_match(text));
sourcepub fn find<'t>(&self, text: &'t str) -> Option<Match<'t>>
pub fn find<'t>(&self, text: &'t str) -> Option<Match<'t>>
Returns the start and end byte range of the leftmost-first match in
text
. If no match exists, then None
is returned.
Note that this should only be used if you want to discover the position
of the match. Testing the existence of a match is faster if you use
is_match
.
§Example
Find the start and end location of the first word with exactly 13 Unicode word characters:
let text = "I categorically deny having triskaidekaphobia.";
let mat = Regex::new(r"\b\w{13}\b").unwrap().find(text).unwrap();
assert_eq!(mat.start(), 2);
assert_eq!(mat.end(), 15);
sourcepub fn find_iter<'r, 't>(&'r self, text: &'t str) -> Matches<'r, 't>
pub fn find_iter<'r, 't>(&'r self, text: &'t str) -> Matches<'r, 't>
Returns an iterator for each successive non-overlapping match in
text
, returning the start and end byte indices with respect to
text
.
§Example
Find the start and end location of every word with exactly 13 Unicode word characters:
let text = "Retroactively relinquishing remunerations is reprehensible.";
for mat in Regex::new(r"\b\w{13}\b").unwrap().find_iter(text) {
println!("{:?}", mat);
}
sourcepub fn captures<'t>(&self, text: &'t str) -> Option<Captures<'t>>
pub fn captures<'t>(&self, text: &'t str) -> Option<Captures<'t>>
Returns the capture groups corresponding to the leftmost-first
match in text
. Capture group 0
always corresponds to the entire
match. If no match is found, then None
is returned.
You should only use captures
if you need access to the location of
capturing group matches. Otherwise, find
is faster for discovering
the location of the overall match.
§Examples
Say you have some text with movie names and their release years, like “‘Citizen Kane’ (1941)”. It’d be nice if we could search for text looking like that, while also extracting the movie name and its release year separately.
let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let text = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.get(1).unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.get(2).unwrap().as_str(), "1941");
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
// You can also access the groups by index using the Index notation.
// Note that this will panic on an invalid index.
assert_eq!(&caps[1], "Citizen Kane");
assert_eq!(&caps[2], "1941");
assert_eq!(&caps[0], "'Citizen Kane' (1941)");
Note that the full match is at capture group 0
. Each subsequent
capture group is indexed by the order of its opening (
.
We can make this example a bit clearer by using named capture groups:
let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
.unwrap();
let text = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.name("title").unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.name("year").unwrap().as_str(), "1941");
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
// You can also access the groups by name using the Index notation.
// Note that this will panic on an invalid group name.
assert_eq!(&caps["title"], "Citizen Kane");
assert_eq!(&caps["year"], "1941");
assert_eq!(&caps[0], "'Citizen Kane' (1941)");
Here we name the capture groups, which we can access with the name
method or the Index
notation with a &str
. Note that the named
capture groups are still accessible with get
or the Index
notation
with a usize
.
The 0
th capture group is always unnamed, so it must always be
accessed with get(0)
or [0]
.
sourcepub fn captures_iter<'r, 't>(&'r self, text: &'t str) -> CaptureMatches<'r, 't>
pub fn captures_iter<'r, 't>(&'r self, text: &'t str) -> CaptureMatches<'r, 't>
Returns an iterator over all the non-overlapping capture groups matched
in text
. This is operationally the same as find_iter
, except it
yields information about capturing group matches.
§Example
We can use this to find all movie titles and their release years in some text, where the movie is formatted like “‘Title’ (xxxx)”:
let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
.unwrap();
let text = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
for caps in re.captures_iter(text) {
println!("Movie: {:?}, Released: {:?}",
&caps["title"], &caps["year"]);
}
// Output:
// Movie: Citizen Kane, Released: 1941
// Movie: The Wizard of Oz, Released: 1939
// Movie: M, Released: 1931
sourcepub fn split<'r, 't>(&'r self, text: &'t str) -> Split<'r, 't>
pub fn split<'r, 't>(&'r self, text: &'t str) -> Split<'r, 't>
Returns an iterator of substrings of text
delimited by a match of the
regular expression. Namely, each element of the iterator corresponds to
text that isn’t matched by the regular expression.
This method will not copy the text given.
§Example
To split a string delimited by arbitrary amounts of spaces or tabs:
let re = Regex::new(r"[ \t]+").unwrap();
let fields: Vec<&str> = re.split("a b \t c\td e").collect();
assert_eq!(fields, vec!["a", "b", "c", "d", "e"]);
sourcepub fn splitn<'r, 't>(&'r self, text: &'t str, limit: usize) -> SplitN<'r, 't>
pub fn splitn<'r, 't>(&'r self, text: &'t str, limit: usize) -> SplitN<'r, 't>
Returns an iterator of at most limit
substrings of text
delimited
by a match of the regular expression. (A limit
of 0
will return no
substrings.) Namely, each element of the iterator corresponds to text
that isn’t matched by the regular expression. The remainder of the
string that is not split will be the last element in the iterator.
This method will not copy the text given.
§Example
Get the first two words in some text:
let re = Regex::new(r"\W+").unwrap();
let fields: Vec<&str> = re.splitn("Hey! How are you?", 3).collect();
assert_eq!(fields, vec!("Hey", "How", "are you?"));
sourcepub fn replace<R, 't>(&self, text: &'t str, rep: R) -> Cow<'t, str>where
R: Replacer,
pub fn replace<R, 't>(&self, text: &'t str, rep: R) -> Cow<'t, str>where
R: Replacer,
Replaces the leftmost-first match with the replacement provided.
The replacement can be a regular string (where $N
and $name
are
expanded to match capture groups) or a function that takes the matches’
Captures
and returns the replaced string.
If no match is found, then a copy of the string is returned unchanged.
§Replacement string syntax
All instances of $name
in the replacement text is replaced with the
corresponding capture group name
.
name
may be an integer corresponding to the index of the
capture group (counted by order of opening parenthesis where 0
is the
entire match) or it can be a name (consisting of letters, digits or
underscores) corresponding to a named capture group.
If name
isn’t a valid capture group (whether the name doesn’t exist
or isn’t a valid index), then it is replaced with the empty string.
The longest possible name is used. e.g., $1a
looks up the capture
group named 1a
and not the capture group at index 1
. To exert more
precise control over the name, use braces, e.g., ${1}a
.
To write a literal $
use $$
.
§Examples
Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:
let re = Regex::new("[^01]+").unwrap();
assert_eq!(re.replace("1078910", ""), "1010");
But anything satisfying the Replacer
trait will work. For example,
a closure of type |&Captures| -> String
provides direct access to the
captures corresponding to a match. This allows one to access
capturing group matches easily:
let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", |caps: &Captures| {
format!("{} {}", &caps[2], &caps[1])
});
assert_eq!(result, "Bruce Springsteen");
But this is a bit cumbersome to use all the time. Instead, a simple
syntax is supported that expands $name
into the corresponding capture
group. Here’s the last example, but using this expansion technique
with named capture groups:
let re = Regex::new(r"(?P<last>[^,\s]+),\s+(?P<first>\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", "$first $last");
assert_eq!(result, "Bruce Springsteen");
Note that using $2
instead of $first
or $1
instead of $last
would produce the same result. To write a literal $
use $$
.
Sometimes the replacement string requires use of curly braces to delineate a capture group replacement and surrounding literal text. For example, if we wanted to join two words together with an underscore:
let re = Regex::new(r"(?P<first>\w+)\s+(?P<second>\w+)").unwrap();
let result = re.replace("deep fried", "${first}_$second");
assert_eq!(result, "deep_fried");
Without the curly braces, the capture group name first_
would be
used, and since it doesn’t exist, it would be replaced with the empty
string.
Finally, sometimes you just want to replace a literal string with no
regard for capturing group expansion. This can be done by wrapping a
byte string with NoExpand
:
use regex::NoExpand;
let re = Regex::new(r"(?P<last>[^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", NoExpand("$2 $last"));
assert_eq!(result, "$2 $last");
sourcepub fn replace_all<R, 't>(&self, text: &'t str, rep: R) -> Cow<'t, str>where
R: Replacer,
pub fn replace_all<R, 't>(&self, text: &'t str, rep: R) -> Cow<'t, str>where
R: Replacer,
Replaces all non-overlapping matches in text
with the replacement
provided. This is the same as calling replacen
with limit
set to
0
.
See the documentation for replace
for details on how to access
capturing group matches in the replacement string.
sourcepub fn replacen<R, 't>(
&self,
text: &'t str,
limit: usize,
rep: R
) -> Cow<'t, str>where
R: Replacer,
pub fn replacen<R, 't>(
&self,
text: &'t str,
limit: usize,
rep: R
) -> Cow<'t, str>where
R: Replacer,
Replaces at most limit
non-overlapping matches in text
with the
replacement provided. If limit
is 0, then all non-overlapping matches
are replaced.
See the documentation for replace
for details on how to access
capturing group matches in the replacement string.
sourcepub fn shortest_match(&self, text: &str) -> Option<usize>
pub fn shortest_match(&self, text: &str) -> Option<usize>
Returns the end location of a match in the text given.
This method may have the same performance characteristics as
is_match
, except it provides an end location for a match. In
particular, the location returned may be shorter than the proper end
of the leftmost-first match.
§Example
Typically, a+
would match the entire first sequence of a
in some
text, but shortest_match
can give up as soon as it sees the first
a
.
let text = "aaaaa";
let pos = Regex::new(r"a+").unwrap().shortest_match(text);
assert_eq!(pos, Some(1));
sourcepub fn shortest_match_at(&self, text: &str, start: usize) -> Option<usize>
pub fn shortest_match_at(&self, text: &str, start: usize) -> Option<usize>
Returns the same as shortest_match, but starts the search at the given offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
sourcepub fn is_match_at(&self, text: &str, start: usize) -> bool
pub fn is_match_at(&self, text: &str, start: usize) -> bool
Returns the same as is_match, but starts the search at the given offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
sourcepub fn find_at<'t>(&self, text: &'t str, start: usize) -> Option<Match<'t>>
pub fn find_at<'t>(&self, text: &'t str, start: usize) -> Option<Match<'t>>
Returns the same as find, but starts the search at the given offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
sourcepub fn captures_read<'t>(
&self,
locs: &mut CaptureLocations,
text: &'t str
) -> Option<Match<'t>>
pub fn captures_read<'t>( &self, locs: &mut CaptureLocations, text: &'t str ) -> Option<Match<'t>>
This is like captures
, but uses
CaptureLocations
instead of
Captures
in order to amortize allocations.
To create a CaptureLocations
value, use the
Regex::capture_locations
method.
This returns the overall match if this was successful, which is always
equivalence to the 0
th capture group.
sourcepub fn captures_read_at<'t>(
&self,
locs: &mut CaptureLocations,
text: &'t str,
start: usize
) -> Option<Match<'t>>
pub fn captures_read_at<'t>( &self, locs: &mut CaptureLocations, text: &'t str, start: usize ) -> Option<Match<'t>>
Returns the same as captures, but starts the search at the given offset and populates the capture locations given.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
sourcepub fn capture_names(&self) -> CaptureNames<'_>
pub fn capture_names(&self) -> CaptureNames<'_>
Returns an iterator over the capture names.
sourcepub fn captures_len(&self) -> usize
pub fn captures_len(&self) -> usize
Returns the number of captures.
sourcepub fn capture_locations(&self) -> CaptureLocations
pub fn capture_locations(&self) -> CaptureLocations
Returns an empty set of capture locations that can be reused in
multiple calls to captures_read
or captures_read_at
.
Trait Implementations§
source§impl<'de> Deserialize<'de> for Regex
impl<'de> Deserialize<'de> for Regex
source§fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>where
D: Deserializer<'de>,
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>where
D: Deserializer<'de>,
source§impl MzReflect for Regex
impl MzReflect for Regex
source§fn add_to_reflected_type_info(rti: &mut ReflectedTypeInfo)
fn add_to_reflected_type_info(rti: &mut ReflectedTypeInfo)
rti
. Read moresource§impl Ord for Regex
impl Ord for Regex
source§impl PartialEq for Regex
impl PartialEq for Regex
source§impl PartialOrd for Regex
impl PartialOrd for Regex
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl RustType<ProtoRegex> for Regex
impl RustType<ProtoRegex> for Regex
source§fn into_proto(&self) -> ProtoRegex
fn into_proto(&self) -> ProtoRegex
Self
into a Proto
value.source§fn from_proto(proto: ProtoRegex) -> Result<Self, TryFromProtoError>
fn from_proto(proto: ProtoRegex) -> Result<Self, TryFromProtoError>
impl Eq for Regex
Auto Trait Implementations§
impl RefUnwindSafe for Regex
impl Send for Regex
impl Sync for Regex
impl Unpin for Regex
impl UnwindSafe for Regex
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<Q, K> Comparable<K> for Q
impl<Q, K> Comparable<K> for Q
source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key
and return true
if they are equal.source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key
and return true
if they are equal.source§impl<T> FutureExt for T
impl<T> FutureExt for T
source§fn with_context(self, otel_cx: Context) -> WithContext<Self>
fn with_context(self, otel_cx: Context) -> WithContext<Self>
source§fn with_current_context(self) -> WithContext<Self>
fn with_current_context(self) -> WithContext<Self>
source§impl<T> Instrument for T
impl<T> Instrument for T
source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
source§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T
in a tonic::Request
source§impl<T, U> OverrideFrom<Option<&T>> for Uwhere
U: OverrideFrom<T>,
impl<T, U> OverrideFrom<Option<&T>> for Uwhere
U: OverrideFrom<T>,
source§impl<T> Pointable for T
impl<T> Pointable for T
source§impl<T> PreferredContainer for T
impl<T> PreferredContainer for T
source§impl<T> ProgressEventTimestamp for T
impl<T> ProgressEventTimestamp for T
source§impl<P, R> ProtoType<R> for Pwhere
R: RustType<P>,
impl<P, R> ProtoType<R> for Pwhere
R: RustType<P>,
source§fn into_rust(self) -> Result<R, TryFromProtoError>
fn into_rust(self) -> Result<R, TryFromProtoError>
RustType::from_proto
.source§fn from_rust(rust: &R) -> P
fn from_rust(rust: &R) -> P
RustType::into_proto
.