pub struct Regex {
pub case_insensitive: bool,
pub dot_matches_new_line: bool,
pub regex: Regex,
}
Expand description
A hashable, comparable, and serializable regular expression type.
The regex::Regex
type, the de facto standard regex type in Rust, does
not implement PartialOrd
, Ord
PartialEq
, Eq
, or Hash
.
The omissions are reasonable. There is no natural definition of ordering for
regexes. There is a natural definition of equality—whether two regexes
describe the same regular language—but that is an expensive property to
compute, and PartialEq
is generally expected to be fast to compute.
This type wraps regex::Regex
and imbues it with implementations of the
above traits. Two regexes are considered equal iff their string
representation is identical, plus flags, such as case_insensitive
,
are identical. The PartialOrd
, Ord
, and Hash
implementations
are similarly based upon the string representation plus flags. As
mentioned above, this is not the natural equivalence relation for regexes: for
example, the regexes aa*
and a+
define the same language, but would not
compare as equal with this implementation of PartialEq
. Still, it is
often useful to have some equivalence relation available (e.g., to store
types containing regexes in a hashmap) even if the equivalence relation is
imperfect.
regex::Regex is hard to serialize (because of the compiled code), so our approach is to instead serialize this wrapper struct, where we skip serializing the actual regex field, and we reconstruct the regex field from the other fields upon deserialization. (Earlier, serialization was buggy due to https://github.com/tailhook/serde-regex/issues/14, and also making the same mistake in our own protobuf serialization code.)
Fields§
§case_insensitive: bool
§dot_matches_new_line: bool
§regex: Regex
Implementations§
source§impl Regex
impl Regex
sourcepub fn new(pattern: &str, case_insensitive: bool) -> Result<Regex, Error>
pub fn new(pattern: &str, case_insensitive: bool) -> Result<Regex, Error>
A simple constructor for the default setting of dot_matches_new_line: true
.
See https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES
“newline-sensitive matching”
Methods from Deref<Target = Regex>§
sourcepub fn is_match(&self, haystack: &str) -> bool
pub fn is_match(&self, haystack: &str) -> bool
Returns true if and only if there is a match for the regex anywhere in the haystack given.
It is recommended to use this method if all you need to do is test whether a match exists, since the underlying matching engine may be able to do less work.
§Example
Test if some haystack contains at least one word with exactly 13 Unicode word characters:
use regex::Regex;
let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = "I categorically deny having triskaidekaphobia.";
assert!(re.is_match(hay));
sourcepub fn find<'h>(&self, haystack: &'h str) -> Option<Match<'h>>
pub fn find<'h>(&self, haystack: &'h str) -> Option<Match<'h>>
This routine searches for the first match of this regex in the
haystack given, and if found, returns a Match
. The Match
provides access to both the byte offsets of the match and the actual
substring that matched.
Note that this should only be used if you want to find the entire
match. If instead you just want to test the existence of a match,
it’s potentially faster to use Regex::is_match(hay)
instead of
Regex::find(hay).is_some()
.
§Example
Find the first word with exactly 13 Unicode word characters:
use regex::Regex;
let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = "I categorically deny having triskaidekaphobia.";
let mat = re.find(hay).unwrap();
assert_eq!(2..15, mat.range());
assert_eq!("categorically", mat.as_str());
sourcepub fn find_iter<'r, 'h>(&'r self, haystack: &'h str) -> Matches<'r, 'h>
pub fn find_iter<'r, 'h>(&'r self, haystack: &'h str) -> Matches<'r, 'h>
Returns an iterator that yields successive non-overlapping matches in
the given haystack. The iterator yields values of type Match
.
§Time complexity
Note that since find_iter
runs potentially many searches on the
haystack and since each search has worst case O(m * n)
time
complexity, the overall worst case time complexity for iteration is
O(m * n^2)
.
§Example
Find every word with exactly 13 Unicode word characters:
use regex::Regex;
let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = "Retroactively relinquishing remunerations is reprehensible.";
let matches: Vec<_> = re.find_iter(hay).map(|m| m.as_str()).collect();
assert_eq!(matches, vec![
"Retroactively",
"relinquishing",
"remunerations",
"reprehensible",
]);
sourcepub fn captures<'h>(&self, haystack: &'h str) -> Option<Captures<'h>>
pub fn captures<'h>(&self, haystack: &'h str) -> Option<Captures<'h>>
This routine searches for the first match of this regex in the haystack
given, and if found, returns not only the overall match but also the
matches of each capture group in the regex. If no match is found, then
None
is returned.
Capture group 0
always corresponds to an implicit unnamed group that
includes the entire match. If a match is found, this group is always
present. Subsequent groups may be named and are numbered, starting
at 1, by the order in which the opening parenthesis appears in the
pattern. For example, in the pattern (?<a>.(?<b>.))(?<c>.)
, a
,
b
and c
correspond to capture group indices 1
, 2
and 3
,
respectively.
You should only use captures
if you need access to the capture group
matches. Otherwise, Regex::find
is generally faster for discovering
just the overall match.
§Example
Say you have some haystack with movie names and their release years, like “‘Citizen Kane’ (1941)”. It’d be nice if we could search for substrings looking like that, while also extracting the movie name and its release year separately. The example below shows how to do that.
use regex::Regex;
let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let hay = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(hay).unwrap();
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
assert_eq!(caps.get(1).unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.get(2).unwrap().as_str(), "1941");
// You can also access the groups by index using the Index notation.
// Note that this will panic on an invalid index. In this case, these
// accesses are always correct because the overall regex will only
// match when these capture groups match.
assert_eq!(&caps[0], "'Citizen Kane' (1941)");
assert_eq!(&caps[1], "Citizen Kane");
assert_eq!(&caps[2], "1941");
Note that the full match is at capture group 0
. Each subsequent
capture group is indexed by the order of its opening (
.
We can make this example a bit clearer by using named capture groups:
use regex::Regex;
let re = Regex::new(r"'(?<title>[^']+)'\s+\((?<year>\d{4})\)").unwrap();
let hay = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(hay).unwrap();
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
assert_eq!(caps.name("title").unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.name("year").unwrap().as_str(), "1941");
// You can also access the groups by name using the Index notation.
// Note that this will panic on an invalid group name. In this case,
// these accesses are always correct because the overall regex will
// only match when these capture groups match.
assert_eq!(&caps[0], "'Citizen Kane' (1941)");
assert_eq!(&caps["title"], "Citizen Kane");
assert_eq!(&caps["year"], "1941");
Here we name the capture groups, which we can access with the name
method or the Index
notation with a &str
. Note that the named
capture groups are still accessible with get
or the Index
notation
with a usize
.
The 0
th capture group is always unnamed, so it must always be
accessed with get(0)
or [0]
.
Finally, one other way to to get the matched substrings is with the
Captures::extract
API:
use regex::Regex;
let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let hay = "Not my favorite movie: 'Citizen Kane' (1941).";
let (full, [title, year]) = re.captures(hay).unwrap().extract();
assert_eq!(full, "'Citizen Kane' (1941)");
assert_eq!(title, "Citizen Kane");
assert_eq!(year, "1941");
sourcepub fn captures_iter<'r, 'h>(
&'r self,
haystack: &'h str,
) -> CaptureMatches<'r, 'h>
pub fn captures_iter<'r, 'h>( &'r self, haystack: &'h str, ) -> CaptureMatches<'r, 'h>
Returns an iterator that yields successive non-overlapping matches in
the given haystack. The iterator yields values of type Captures
.
This is the same as Regex::find_iter
, but instead of only providing
access to the overall match, each value yield includes access to the
matches of all capture groups in the regex. Reporting this extra match
data is potentially costly, so callers should only use captures_iter
over find_iter
when they actually need access to the capture group
matches.
§Time complexity
Note that since captures_iter
runs potentially many searches on the
haystack and since each search has worst case O(m * n)
time
complexity, the overall worst case time complexity for iteration is
O(m * n^2)
.
§Example
We can use this to find all movie titles and their release years in some haystack, where the movie is formatted like “‘Title’ (xxxx)”:
use regex::Regex;
let re = Regex::new(r"'([^']+)'\s+\(([0-9]{4})\)").unwrap();
let hay = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
let mut movies = vec![];
for (_, [title, year]) in re.captures_iter(hay).map(|c| c.extract()) {
movies.push((title, year.parse::<i64>()?));
}
assert_eq!(movies, vec![
("Citizen Kane", 1941),
("The Wizard of Oz", 1939),
("M", 1931),
]);
Or with named groups:
use regex::Regex;
let re = Regex::new(r"'(?<title>[^']+)'\s+\((?<year>[0-9]{4})\)").unwrap();
let hay = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
let mut it = re.captures_iter(hay);
let caps = it.next().unwrap();
assert_eq!(&caps["title"], "Citizen Kane");
assert_eq!(&caps["year"], "1941");
let caps = it.next().unwrap();
assert_eq!(&caps["title"], "The Wizard of Oz");
assert_eq!(&caps["year"], "1939");
let caps = it.next().unwrap();
assert_eq!(&caps["title"], "M");
assert_eq!(&caps["year"], "1931");
sourcepub fn split<'r, 'h>(&'r self, haystack: &'h str) -> Split<'r, 'h>
pub fn split<'r, 'h>(&'r self, haystack: &'h str) -> Split<'r, 'h>
Returns an iterator of substrings of the haystack given, delimited by a match of the regex. Namely, each element of the iterator corresponds to a part of the haystack that isn’t matched by the regular expression.
§Time complexity
Since iterators over all matches requires running potentially many
searches on the haystack, and since each search has worst case
O(m * n)
time complexity, the overall worst case time complexity for
this routine is O(m * n^2)
.
§Example
To split a string delimited by arbitrary amounts of spaces or tabs:
use regex::Regex;
let re = Regex::new(r"[ \t]+").unwrap();
let hay = "a b \t c\td e";
let fields: Vec<&str> = re.split(hay).collect();
assert_eq!(fields, vec!["a", "b", "c", "d", "e"]);
§Example: more cases
Basic usage:
use regex::Regex;
let re = Regex::new(r" ").unwrap();
let hay = "Mary had a little lamb";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["Mary", "had", "a", "little", "lamb"]);
let re = Regex::new(r"X").unwrap();
let hay = "";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec![""]);
let re = Regex::new(r"X").unwrap();
let hay = "lionXXtigerXleopard";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["lion", "", "tiger", "leopard"]);
let re = Regex::new(r"::").unwrap();
let hay = "lion::tiger::leopard";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["lion", "tiger", "leopard"]);
If a haystack contains multiple contiguous matches, you will end up with empty spans yielded by the iterator:
use regex::Regex;
let re = Regex::new(r"X").unwrap();
let hay = "XXXXaXXbXc";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "", "", "", "a", "", "b", "c"]);
let re = Regex::new(r"/").unwrap();
let hay = "(///)";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["(", "", "", ")"]);
Separators at the start or end of a haystack are neighbored by empty substring.
use regex::Regex;
let re = Regex::new(r"0").unwrap();
let hay = "010";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "1", ""]);
When the empty string is used as a regex, it splits at every valid UTF-8 boundary by default (which includes the beginning and end of the haystack):
use regex::Regex;
let re = Regex::new(r"").unwrap();
let hay = "rust";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "r", "u", "s", "t", ""]);
// Splitting by an empty string is UTF-8 aware by default!
let re = Regex::new(r"").unwrap();
let hay = "☃";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "☃", ""]);
Contiguous separators (commonly shows up with whitespace), can lead to possibly surprising behavior. For example, this code is correct:
use regex::Regex;
let re = Regex::new(r" ").unwrap();
let hay = " a b c";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "", "", "", "a", "", "b", "c"]);
It does not give you ["a", "b", "c"]
. For that behavior, you’d want
to match contiguous space characters:
use regex::Regex;
let re = Regex::new(r" +").unwrap();
let hay = " a b c";
let got: Vec<&str> = re.split(hay).collect();
// N.B. This does still include a leading empty span because ' +'
// matches at the beginning of the haystack.
assert_eq!(got, vec!["", "a", "b", "c"]);
sourcepub fn splitn<'r, 'h>(
&'r self,
haystack: &'h str,
limit: usize,
) -> SplitN<'r, 'h>
pub fn splitn<'r, 'h>( &'r self, haystack: &'h str, limit: usize, ) -> SplitN<'r, 'h>
Returns an iterator of at most limit
substrings of the haystack
given, delimited by a match of the regex. (A limit
of 0
will return
no substrings.) Namely, each element of the iterator corresponds to a
part of the haystack that isn’t matched by the regular expression.
The remainder of the haystack that is not split will be the last
element in the iterator.
§Time complexity
Since iterators over all matches requires running potentially many
searches on the haystack, and since each search has worst case
O(m * n)
time complexity, the overall worst case time complexity for
this routine is O(m * n^2)
.
Although note that the worst case time here has an upper bound given
by the limit
parameter.
§Example
Get the first two words in some haystack:
use regex::Regex;
let re = Regex::new(r"\W+").unwrap();
let hay = "Hey! How are you?";
let fields: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(fields, vec!["Hey", "How", "are you?"]);
§Examples: more cases
use regex::Regex;
let re = Regex::new(r" ").unwrap();
let hay = "Mary had a little lamb";
let got: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(got, vec!["Mary", "had", "a little lamb"]);
let re = Regex::new(r"X").unwrap();
let hay = "";
let got: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(got, vec![""]);
let re = Regex::new(r"X").unwrap();
let hay = "lionXXtigerXleopard";
let got: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(got, vec!["lion", "", "tigerXleopard"]);
let re = Regex::new(r"::").unwrap();
let hay = "lion::tiger::leopard";
let got: Vec<&str> = re.splitn(hay, 2).collect();
assert_eq!(got, vec!["lion", "tiger::leopard"]);
let re = Regex::new(r"X").unwrap();
let hay = "abcXdef";
let got: Vec<&str> = re.splitn(hay, 1).collect();
assert_eq!(got, vec!["abcXdef"]);
let re = Regex::new(r"X").unwrap();
let hay = "abcdef";
let got: Vec<&str> = re.splitn(hay, 2).collect();
assert_eq!(got, vec!["abcdef"]);
let re = Regex::new(r"X").unwrap();
let hay = "abcXdef";
let got: Vec<&str> = re.splitn(hay, 0).collect();
assert!(got.is_empty());
sourcepub fn replace<'h, R>(&self, haystack: &'h str, rep: R) -> Cow<'h, str>where
R: Replacer,
pub fn replace<'h, R>(&self, haystack: &'h str, rep: R) -> Cow<'h, str>where
R: Replacer,
Replaces the leftmost-first match in the given haystack with the
replacement provided. The replacement can be a regular string (where
$N
and $name
are expanded to match capture groups) or a function
that takes a Captures
and returns the replaced string.
If no match is found, then the haystack is returned unchanged. In that
case, this implementation will likely return a Cow::Borrowed
value
such that no allocation is performed.
When a Cow::Borrowed
is returned, the value returned is guaranteed
to be equivalent to the haystack
given.
§Replacement string syntax
All instances of $ref
in the replacement string are replaced with
the substring corresponding to the capture group identified by ref
.
ref
may be an integer corresponding to the index of the capture group
(counted by order of opening parenthesis where 0
is the entire match)
or it can be a name (consisting of letters, digits or underscores)
corresponding to a named capture group.
If ref
isn’t a valid capture group (whether the name doesn’t exist or
isn’t a valid index), then it is replaced with the empty string.
The longest possible name is used. For example, $1a
looks up the
capture group named 1a
and not the capture group at index 1
. To
exert more precise control over the name, use braces, e.g., ${1}a
.
To write a literal $
use $$
.
§Example
Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:
use regex::Regex;
let re = Regex::new(r"[^01]+").unwrap();
assert_eq!(re.replace("1078910", ""), "1010");
But anything satisfying the Replacer
trait will work. For example,
a closure of type |&Captures| -> String
provides direct access to the
captures corresponding to a match. This allows one to access capturing
group matches easily:
use regex::{Captures, Regex};
let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", |caps: &Captures| {
format!("{} {}", &caps[2], &caps[1])
});
assert_eq!(result, "Bruce Springsteen");
But this is a bit cumbersome to use all the time. Instead, a simple
syntax is supported (as described above) that expands $name
into the
corresponding capture group. Here’s the last example, but using this
expansion technique with named capture groups:
use regex::Regex;
let re = Regex::new(r"(?<last>[^,\s]+),\s+(?<first>\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", "$first $last");
assert_eq!(result, "Bruce Springsteen");
Note that using $2
instead of $first
or $1
instead of $last
would produce the same result. To write a literal $
use $$
.
Sometimes the replacement string requires use of curly braces to delineate a capture group replacement when it is adjacent to some other literal text. For example, if we wanted to join two words together with an underscore:
use regex::Regex;
let re = Regex::new(r"(?<first>\w+)\s+(?<second>\w+)").unwrap();
let result = re.replace("deep fried", "${first}_$second");
assert_eq!(result, "deep_fried");
Without the curly braces, the capture group name first_
would be
used, and since it doesn’t exist, it would be replaced with the empty
string.
Finally, sometimes you just want to replace a literal string with no
regard for capturing group expansion. This can be done by wrapping a
string with NoExpand
:
use regex::{NoExpand, Regex};
let re = Regex::new(r"(?<last>[^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", NoExpand("$2 $last"));
assert_eq!(result, "$2 $last");
Using NoExpand
may also be faster, since the replacement string won’t
need to be parsed for the $
syntax.
sourcepub fn replace_all<'h, R>(&self, haystack: &'h str, rep: R) -> Cow<'h, str>where
R: Replacer,
pub fn replace_all<'h, R>(&self, haystack: &'h str, rep: R) -> Cow<'h, str>where
R: Replacer,
Replaces all non-overlapping matches in the haystack with the
replacement provided. This is the same as calling replacen
with
limit
set to 0
.
If no match is found, then the haystack is returned unchanged. In that
case, this implementation will likely return a Cow::Borrowed
value
such that no allocation is performed.
When a Cow::Borrowed
is returned, the value returned is guaranteed
to be equivalent to the haystack
given.
The documentation for Regex::replace
goes into more detail about
what kinds of replacement strings are supported.
§Time complexity
Since iterators over all matches requires running potentially many
searches on the haystack, and since each search has worst case
O(m * n)
time complexity, the overall worst case time complexity for
this routine is O(m * n^2)
.
§Fallibility
If you need to write a replacement routine where any individual replacement might “fail,” doing so with this API isn’t really feasible because there’s no way to stop the search process if a replacement fails. Instead, if you need this functionality, you should consider implementing your own replacement routine:
use regex::{Captures, Regex};
fn replace_all<E>(
re: &Regex,
haystack: &str,
replacement: impl Fn(&Captures) -> Result<String, E>,
) -> Result<String, E> {
let mut new = String::with_capacity(haystack.len());
let mut last_match = 0;
for caps in re.captures_iter(haystack) {
let m = caps.get(0).unwrap();
new.push_str(&haystack[last_match..m.start()]);
new.push_str(&replacement(&caps)?);
last_match = m.end();
}
new.push_str(&haystack[last_match..]);
Ok(new)
}
// Let's replace each word with the number of bytes in that word.
// But if we see a word that is "too long," we'll give up.
let re = Regex::new(r"\w+").unwrap();
let replacement = |caps: &Captures| -> Result<String, &'static str> {
if caps[0].len() >= 5 {
return Err("word too long");
}
Ok(caps[0].len().to_string())
};
assert_eq!(
Ok("2 3 3 3?".to_string()),
replace_all(&re, "hi how are you?", &replacement),
);
assert!(replace_all(&re, "hi there", &replacement).is_err());
§Example
This example shows how to flip the order of whitespace (excluding line terminators) delimited fields, and normalizes the whitespace that delimits the fields:
use regex::Regex;
let re = Regex::new(r"(?m)^(\S+)[\s--\r\n]+(\S+)$").unwrap();
let hay = "
Greetings 1973
Wild\t1973
BornToRun\t\t\t\t1975
Darkness 1978
TheRiver 1980
";
let new = re.replace_all(hay, "$2 $1");
assert_eq!(new, "
1973 Greetings
1973 Wild
1975 BornToRun
1978 Darkness
1980 TheRiver
");
sourcepub fn replacen<'h, R>(
&self,
haystack: &'h str,
limit: usize,
rep: R,
) -> Cow<'h, str>where
R: Replacer,
pub fn replacen<'h, R>(
&self,
haystack: &'h str,
limit: usize,
rep: R,
) -> Cow<'h, str>where
R: Replacer,
Replaces at most limit
non-overlapping matches in the haystack with
the replacement provided. If limit
is 0
, then all non-overlapping
matches are replaced. That is, Regex::replace_all(hay, rep)
is
equivalent to Regex::replacen(hay, 0, rep)
.
If no match is found, then the haystack is returned unchanged. In that
case, this implementation will likely return a Cow::Borrowed
value
such that no allocation is performed.
When a Cow::Borrowed
is returned, the value returned is guaranteed
to be equivalent to the haystack
given.
The documentation for Regex::replace
goes into more detail about
what kinds of replacement strings are supported.
§Time complexity
Since iterators over all matches requires running potentially many
searches on the haystack, and since each search has worst case
O(m * n)
time complexity, the overall worst case time complexity for
this routine is O(m * n^2)
.
Although note that the worst case time here has an upper bound given
by the limit
parameter.
§Fallibility
See the corresponding section in the docs for Regex::replace_all
for tips on how to deal with a replacement routine that can fail.
§Example
This example shows how to flip the order of whitespace (excluding line terminators) delimited fields, and normalizes the whitespace that delimits the fields. But we only do it for the first two matches.
use regex::Regex;
let re = Regex::new(r"(?m)^(\S+)[\s--\r\n]+(\S+)$").unwrap();
let hay = "
Greetings 1973
Wild\t1973
BornToRun\t\t\t\t1975
Darkness 1978
TheRiver 1980
";
let new = re.replacen(hay, 2, "$2 $1");
assert_eq!(new, "
1973 Greetings
1973 Wild
BornToRun\t\t\t\t1975
Darkness 1978
TheRiver 1980
");
sourcepub fn shortest_match(&self, haystack: &str) -> Option<usize>
pub fn shortest_match(&self, haystack: &str) -> Option<usize>
Returns the end byte offset of the first match in the haystack given.
This method may have the same performance characteristics as
is_match
. Behaviorlly, it doesn’t just report whether it match
occurs, but also the end offset for a match. In particular, the offset
returned may be shorter than the proper end of the leftmost-first
match that you would find via Regex::find
.
Note that it is not guaranteed that this routine finds the shortest or “earliest” possible match. Instead, the main idea of this API is that it returns the offset at the point at which the internal regex engine has determined that a match has occurred. This may vary depending on which internal regex engine is used, and thus, the offset itself may change based on internal heuristics.
§Example
Typically, a+
would match the entire first sequence of a
in some
haystack, but shortest_match
may give up as soon as it sees the
first a
.
use regex::Regex;
let re = Regex::new(r"a+").unwrap();
let offset = re.shortest_match("aaaaa").unwrap();
assert_eq!(offset, 1);
sourcepub fn shortest_match_at(&self, haystack: &str, start: usize) -> Option<usize>
pub fn shortest_match_at(&self, haystack: &str, start: usize) -> Option<usize>
Returns the same as Regex::shortest_match
, but starts the search at
the given offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only match
when start == 0
.
If a match is found, the offset returned is relative to the beginning of the haystack, not the beginning of the search.
§Panics
This panics when start >= haystack.len() + 1
.
§Example
This example shows the significance of start
by demonstrating how it
can be used to permit look-around assertions in a regex to take the
surrounding context into account.
use regex::Regex;
let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert_eq!(re.shortest_match(&hay[2..]), Some(4));
// No match because the assertions take the context into account.
assert_eq!(re.shortest_match_at(hay, 2), None);
sourcepub fn is_match_at(&self, haystack: &str, start: usize) -> bool
pub fn is_match_at(&self, haystack: &str, start: usize) -> bool
Returns the same as Regex::is_match
, but starts the search at the
given offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
§Panics
This panics when start >= haystack.len() + 1
.
§Example
This example shows the significance of start
by demonstrating how it
can be used to permit look-around assertions in a regex to take the
surrounding context into account.
use regex::Regex;
let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert!(re.is_match(&hay[2..]));
// No match because the assertions take the context into account.
assert!(!re.is_match_at(hay, 2));
sourcepub fn find_at<'h>(&self, haystack: &'h str, start: usize) -> Option<Match<'h>>
pub fn find_at<'h>(&self, haystack: &'h str, start: usize) -> Option<Match<'h>>
Returns the same as Regex::find
, but starts the search at the given
offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
§Panics
This panics when start >= haystack.len() + 1
.
§Example
This example shows the significance of start
by demonstrating how it
can be used to permit look-around assertions in a regex to take the
surrounding context into account.
use regex::Regex;
let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert_eq!(re.find(&hay[2..]).map(|m| m.range()), Some(0..4));
// No match because the assertions take the context into account.
assert_eq!(re.find_at(hay, 2), None);
sourcepub fn captures_at<'h>(
&self,
haystack: &'h str,
start: usize,
) -> Option<Captures<'h>>
pub fn captures_at<'h>( &self, haystack: &'h str, start: usize, ) -> Option<Captures<'h>>
Returns the same as Regex::captures
, but starts the search at the
given offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
§Panics
This panics when start >= haystack.len() + 1
.
§Example
This example shows the significance of start
by demonstrating how it
can be used to permit look-around assertions in a regex to take the
surrounding context into account.
use regex::Regex;
let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert_eq!(&re.captures(&hay[2..]).unwrap()[0], "chew");
// No match because the assertions take the context into account.
assert!(re.captures_at(hay, 2).is_none());
sourcepub fn captures_read<'h>(
&self,
locs: &mut CaptureLocations,
haystack: &'h str,
) -> Option<Match<'h>>
pub fn captures_read<'h>( &self, locs: &mut CaptureLocations, haystack: &'h str, ) -> Option<Match<'h>>
This is like Regex::captures
, but writes the byte offsets of each
capture group match into the locations given.
A CaptureLocations
stores the same byte offsets as a Captures
,
but does not store a reference to the haystack. This makes its API
a bit lower level and less convenient. But in exchange, callers
may allocate their own CaptureLocations
and reuse it for multiple
searches. This may be helpful if allocating a Captures
shows up in a
profile as too costly.
To create a CaptureLocations
value, use the
Regex::capture_locations
method.
This also returns the overall match if one was found. When a match is
found, its offsets are also always stored in locs
at index 0
.
§Panics
This routine may panic if the given CaptureLocations
was not created
by this regex.
§Example
use regex::Regex;
let re = Regex::new(r"^([a-z]+)=(\S*)$").unwrap();
let mut locs = re.capture_locations();
assert!(re.captures_read(&mut locs, "id=foo123").is_some());
assert_eq!(Some((0, 9)), locs.get(0));
assert_eq!(Some((0, 2)), locs.get(1));
assert_eq!(Some((3, 9)), locs.get(2));
sourcepub fn captures_read_at<'h>(
&self,
locs: &mut CaptureLocations,
haystack: &'h str,
start: usize,
) -> Option<Match<'h>>
pub fn captures_read_at<'h>( &self, locs: &mut CaptureLocations, haystack: &'h str, start: usize, ) -> Option<Match<'h>>
Returns the same as Regex::captures_read
, but starts the search at
the given offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when start == 0
.
§Panics
This panics when start >= haystack.len() + 1
.
This routine may also panic if the given CaptureLocations
was not
created by this regex.
§Example
This example shows the significance of start
by demonstrating how it
can be used to permit look-around assertions in a regex to take the
surrounding context into account.
use regex::Regex;
let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
let mut locs = re.capture_locations();
// We get a match here, but it's probably not intended.
assert!(re.captures_read(&mut locs, &hay[2..]).is_some());
// No match because the assertions take the context into account.
assert!(re.captures_read_at(&mut locs, hay, 2).is_none());
sourcepub fn as_str(&self) -> &str
pub fn as_str(&self) -> &str
Returns the original string of this regex.
§Example
use regex::Regex;
let re = Regex::new(r"foo\w+bar").unwrap();
assert_eq!(re.as_str(), r"foo\w+bar");
sourcepub fn capture_names(&self) -> CaptureNames<'_>
pub fn capture_names(&self) -> CaptureNames<'_>
Returns an iterator over the capture names in this regex.
The iterator returned yields elements of type Option<&str>
. That is,
the iterator yields values for all capture groups, even ones that are
unnamed. The order of the groups corresponds to the order of the group’s
corresponding opening parenthesis.
The first element of the iterator always yields the group corresponding to the overall match, and this group is always unnamed. Therefore, the iterator always yields at least one group.
§Example
This shows basic usage with a mix of named and unnamed capture groups:
use regex::Regex;
let re = Regex::new(r"(?<a>.(?<b>.))(.)(?:.)(?<c>.)").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), Some(Some("a")));
assert_eq!(names.next(), Some(Some("b")));
assert_eq!(names.next(), Some(None));
// the '(?:.)' group is non-capturing and so doesn't appear here!
assert_eq!(names.next(), Some(Some("c")));
assert_eq!(names.next(), None);
The iterator always yields at least one element, even for regexes with no capture groups and even for regexes that can never match:
use regex::Regex;
let re = Regex::new(r"").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), None);
let re = Regex::new(r"[a&&b]").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), None);
sourcepub fn captures_len(&self) -> usize
pub fn captures_len(&self) -> usize
Returns the number of captures groups in this regex.
This includes all named and unnamed groups, including the implicit unnamed group that is always present and corresponds to the entire match.
Since the implicit unnamed group is always included in this length, the length returned is guaranteed to be greater than zero.
§Example
use regex::Regex;
let re = Regex::new(r"foo").unwrap();
assert_eq!(1, re.captures_len());
let re = Regex::new(r"(foo)").unwrap();
assert_eq!(2, re.captures_len());
let re = Regex::new(r"(?<a>.(?<b>.))(.)(?:.)(?<c>.)").unwrap();
assert_eq!(5, re.captures_len());
let re = Regex::new(r"[a&&b]").unwrap();
assert_eq!(1, re.captures_len());
sourcepub fn static_captures_len(&self) -> Option<usize>
pub fn static_captures_len(&self) -> Option<usize>
Returns the total number of capturing groups that appear in every possible match.
If the number of capture groups can vary depending on the match, then
this returns None
. That is, a value is only returned when the number
of matching groups is invariant or “static.”
Note that like Regex::captures_len
, this does include the
implicit capturing group corresponding to the entire match. Therefore,
when a non-None value is returned, it is guaranteed to be at least 1
.
Stated differently, a return value of Some(0)
is impossible.
§Example
This shows a few cases where a static number of capture groups is available and a few cases where it is not.
use regex::Regex;
let len = |pattern| {
Regex::new(pattern).map(|re| re.static_captures_len())
};
assert_eq!(Some(1), len("a")?);
assert_eq!(Some(2), len("(a)")?);
assert_eq!(Some(2), len("(a)|(b)")?);
assert_eq!(Some(3), len("(a)(b)|(c)(d)")?);
assert_eq!(None, len("(a)|b")?);
assert_eq!(None, len("a|(b)")?);
assert_eq!(None, len("(b)*")?);
assert_eq!(Some(2), len("(b)+")?);
sourcepub fn capture_locations(&self) -> CaptureLocations
pub fn capture_locations(&self) -> CaptureLocations
Returns a fresh allocated set of capture locations that can
be reused in multiple calls to Regex::captures_read
or
Regex::captures_read_at
.
The returned locations can be used for any subsequent search for this particular regex. There is no guarantee that it is correct to use for other regexes, even if they have the same number of capture groups.
§Example
use regex::Regex;
let re = Regex::new(r"(.)(.)(\w+)").unwrap();
let mut locs = re.capture_locations();
assert!(re.captures_read(&mut locs, "Padron").is_some());
assert_eq!(locs.get(0), Some((0, 6)));
assert_eq!(locs.get(1), Some((0, 1)));
assert_eq!(locs.get(2), Some((1, 2)));
assert_eq!(locs.get(3), Some((2, 6)));
Trait Implementations§
source§impl<'de> Deserialize<'de> for Regex
impl<'de> Deserialize<'de> for Regex
source§fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>where
D: Deserializer<'de>,
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>where
D: Deserializer<'de>,
source§impl MzReflect for Regex
impl MzReflect for Regex
source§fn add_to_reflected_type_info(rti: &mut ReflectedTypeInfo)
fn add_to_reflected_type_info(rti: &mut ReflectedTypeInfo)
rti
. Read moresource§impl Ord for Regex
impl Ord for Regex
source§impl PartialOrd for Regex
impl PartialOrd for Regex
source§impl RustType<ProtoRegex> for Regex
impl RustType<ProtoRegex> for Regex
source§fn into_proto(&self) -> ProtoRegex
fn into_proto(&self) -> ProtoRegex
Self
into a Proto
value.source§fn from_proto(proto: ProtoRegex) -> Result<Self, TryFromProtoError>
fn from_proto(proto: ProtoRegex) -> Result<Self, TryFromProtoError>
source§fn into_proto_owned(self) -> Proto
fn into_proto_owned(self) -> Proto
Self::into_proto
that types can
optionally implement, otherwise, the default implementation
delegates to Self::into_proto
.impl Eq for Regex
Auto Trait Implementations§
impl Freeze for Regex
impl RefUnwindSafe for Regex
impl Send for Regex
impl Sync for Regex
impl Unpin for Regex
impl UnwindSafe for Regex
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§default unsafe fn clone_to_uninit(&self, dst: *mut T)
default unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)source§impl<Q, K> Comparable<K> for Q
impl<Q, K> Comparable<K> for Q
source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key
and return true
if they are equal.source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key
and return true
if they are equal.source§impl<T> FutureExt for T
impl<T> FutureExt for T
source§fn with_context(self, otel_cx: Context) -> WithContext<Self>
fn with_context(self, otel_cx: Context) -> WithContext<Self>
source§fn with_current_context(self) -> WithContext<Self>
fn with_current_context(self) -> WithContext<Self>
source§impl<T> Instrument for T
impl<T> Instrument for T
source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
source§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T
in a tonic::Request
source§impl<T, U> OverrideFrom<Option<&T>> for Uwhere
U: OverrideFrom<T>,
impl<T, U> OverrideFrom<Option<&T>> for Uwhere
U: OverrideFrom<T>,
source§impl<T> Pointable for T
impl<T> Pointable for T
source§impl<T> PreferredContainer for T
impl<T> PreferredContainer for T
source§impl<T> ProgressEventTimestamp for T
impl<T> ProgressEventTimestamp for T
source§impl<P, R> ProtoType<R> for Pwhere
R: RustType<P>,
impl<P, R> ProtoType<R> for Pwhere
R: RustType<P>,
source§fn into_rust(self) -> Result<R, TryFromProtoError>
fn into_rust(self) -> Result<R, TryFromProtoError>
RustType::from_proto
.source§fn from_rust(rust: &R) -> P
fn from_rust(rust: &R) -> P
RustType::into_proto
.source§impl<'a, S, T> Semigroup<&'a S> for Twhere
T: Semigroup<S>,
impl<'a, S, T> Semigroup<&'a S> for Twhere
T: Semigroup<S>,
source§fn plus_equals(&mut self, rhs: &&'a S)
fn plus_equals(&mut self, rhs: &&'a S)
std::ops::AddAssign
, for types that do not implement AddAssign
.