Enum regex_automata::util::look::Look
source · pub enum Look {
Show 18 variants
Start,
End,
StartLF,
EndLF,
StartCRLF,
EndCRLF,
WordAscii,
WordAsciiNegate,
WordUnicode,
WordUnicodeNegate,
WordStartAscii,
WordEndAscii,
WordStartUnicode,
WordEndUnicode,
WordStartHalfAscii,
WordEndHalfAscii,
WordStartHalfUnicode,
WordEndHalfUnicode,
}
Expand description
A look-around assertion.
An assertion matches at a position between characters in a haystack. Namely, it does not actually “consume” any input as most parts of a regular expression do. Assertions are a way of stating that some property must be true at a particular point during matching.
For example, (?m)^[a-z]+$
is a pattern that:
- Scans the haystack for a position at which
(?m:^)
is satisfied. That occurs at either the beginning of the haystack, or immediately following a\n
character. - Looks for one or more occurrences of
[a-z]
. - Once
[a-z]+
has matched as much as it can, an overall match is only reported when[a-z]+
stops just before a\n
.
So in this case, abc
and \nabc\n
match, but \nabc1\n
does not.
Assertions are also called “look-around,” “look-behind” and “look-ahead.”
Specifically, some assertions are look-behind (like ^
), other assertions
are look-ahead (like $
) and yet other assertions are both look-ahead and
look-behind (like \b
).
Assertions in an NFA
An assertion in a thompson::NFA
can be
thought of as a conditional epsilon transition. That is, a matching engine
like the PikeVM
only permits
moving through conditional epsilon transitions when their condition
is satisfied at whatever position the PikeVM
is currently at in the
haystack.
How assertions are handled in a DFA
is trickier, since a DFA does not
have epsilon transitions at all. In this case, they are compiled into the
automaton itself, at the expense of more states than what would be required
without an assertion.
Variants§
Start
Match the beginning of text. Specifically, this matches at the starting position of the input.
End
Match the end of text. Specifically, this matches at the ending position of the input.
StartLF
Match the beginning of a line or the beginning of text. Specifically,
this matches at the starting position of the input, or at the position
immediately following a \n
character.
EndLF
Match the end of a line or the end of text. Specifically, this matches
at the end position of the input, or at the position immediately
preceding a \n
character.
StartCRLF
Match the beginning of a line or the beginning of text. Specifically,
this matches at the starting position of the input, or at the position
immediately following either a \r
or \n
character, but never after
a \r
when a \n
follows.
EndCRLF
Match the end of a line or the end of text. Specifically, this matches
at the end position of the input, or at the position immediately
preceding a \r
or \n
character, but never before a \n
when a \r
precedes it.
WordAscii
Match an ASCII-only word boundary. That is, this matches a position where the left adjacent character and right adjacent character correspond to a word and non-word or a non-word and word character.
WordAsciiNegate
Match an ASCII-only negation of a word boundary.
WordUnicode
Match a Unicode-aware word boundary. That is, this matches a position where the left adjacent character and right adjacent character correspond to a word and non-word or a non-word and word character.
WordUnicodeNegate
Match a Unicode-aware negation of a word boundary.
WordStartAscii
Match the start of an ASCII-only word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character and the following character is a word character.
WordEndAscii
Match the end of an ASCII-only word boundary. That is, this matches a position at either the end of the haystack or where the previous character is a word character and the following character is not a word character.
WordStartUnicode
Match the start of a Unicode word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character and the following character is a word character.
WordEndUnicode
Match the end of a Unicode word boundary. That is, this matches a position at either the end of the haystack or where the previous character is a word character and the following character is not a word character.
WordStartHalfAscii
Match the start half of an ASCII-only word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character.
WordEndHalfAscii
Match the end half of an ASCII-only word boundary. That is, this matches a position at either the end of the haystack or where the following character is not a word character.
WordStartHalfUnicode
Match the start half of a Unicode word boundary. That is, this matches a position at either the beginning of the haystack or where the previous character is not a word character.
WordEndHalfUnicode
Match the end half of a Unicode word boundary. That is, this matches a position at either the end of the haystack or where the following character is not a word character.
Implementations§
source§impl Look
impl Look
sourcepub const fn reversed(self) -> Look
pub const fn reversed(self) -> Look
Flip the look-around assertion to its equivalent for reverse searches.
For example, StartLF
gets translated to EndLF
.
Some assertions, such as WordUnicode
, remain the same since they
match the same positions regardless of the direction of the search.
sourcepub const fn as_repr(self) -> u32
pub const fn as_repr(self) -> u32
Return the underlying representation of this look-around enumeration
as an integer. Giving the return value to the Look::from_repr
constructor is guaranteed to return the same look-around variant that
one started with within a semver compatible release of this crate.
sourcepub const fn from_repr(repr: u32) -> Option<Look>
pub const fn from_repr(repr: u32) -> Option<Look>
Given the underlying representation of a Look
value, return the
corresponding Look
value if the representation is valid. Otherwise
None
is returned.
sourcepub const fn as_char(self) -> char
pub const fn as_char(self) -> char
Returns a convenient single codepoint representation of this look-around assertion. Each assertion is guaranteed to be represented by a distinct character.
This is useful for succinctly representing a look-around assertion in human friendly but succinct output intended for a programmer working on regex internals.