Struct regex_automata::nfa::thompson::pikevm::PikeVM

source ·

pub struct PikeVM { /* private fields */ }

Expand description

A virtual machine for executing regex searches with capturing groups.

Infallible APIs

Unlike most other regex engines in this crate, a PikeVM never returns an error at search time. It supports all Anchored configurations, never quits and works on haystacks of arbitrary length.

There are two caveats to mention though:

If an invalid pattern ID is given to a search via Anchored::Pattern, then the PikeVM will report “no match.” This is consistent with all other regex engines in this crate.
When using PikeVM::which_overlapping_matches with a PatternSet that has insufficient capacity to store all valid pattern IDs, then if a match occurs for a PatternID that cannot be inserted, it is silently dropped as if it did not match.

The PikeVM is generally the most “powerful” regex engine in this crate. “Powerful” in this context means that it can handle any regular expression that is parseable by regex-syntax and any size haystack. Regretably, the PikeVM is also simultaneously often the slowest regex engine in practice. This results in an annoying situation where one generally tries to pick any other regex engine (or perhaps none at all) before being forced to fall back to a PikeVM.

For example, a common strategy for dealing with capturing groups is to actually look for the overall match of the regex using a faster regex engine, like a lazy DFA. Once the overall match is found, one can then run the PikeVM on just the match span to find the spans of the capturing groups. In this way, the faster regex engine does the majority of the work, while the PikeVM only lends its power in a more limited role.

Unfortunately, this isn’t always possible because the faster regex engines don’t support all of the regex features in regex-syntax. This notably includes (and is currently limited to) Unicode word boundaries. So if your pattern has Unicode word boundaries, you typically can’t use a DFA-based regex engine at all (unless you enable heuristic support for it). (The one-pass DFA can handle Unicode word boundaries for anchored searches only, but in a cruel sort of joke, many Unicode features tend to result in making the regex not one-pass.)

Example

This example shows that the PikeVM implements Unicode word boundaries correctly by default.

use regex_automata::{nfa::thompson::pikevm::PikeVM, Match};

let re = PikeVM::new(r"\b\w+\b")?;
let mut cache = re.create_cache();

let mut it = re.find_iter(&mut cache, "Шерлок Холмс");
assert_eq!(Some(Match::must(0, 0..12)), it.next());
assert_eq!(Some(Match::must(0, 13..23)), it.next());
assert_eq!(None, it.next());

Struct regex_automata::nfa::thompson::pikevm::PikeVM

Implementations§

impl PikeVM

pub fn new(pattern: &str) -> Result<PikeVM, BuildError>

pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<PikeVM, BuildError>

pub fn new_from_nfa(nfa: NFA) -> Result<PikeVM, BuildError>

pub fn always_match() -> Result<PikeVM, BuildError>

pub fn never_match() -> Result<PikeVM, BuildError>

pub fn config() -> Config

pub fn builder() -> Builder

pub fn create_captures(&self) -> Captures

pub fn create_cache(&self) -> Cache

pub fn reset_cache(&self, cache: &mut Cache)

pub fn pattern_len(&self) -> usize

pub fn get_config(&self) -> &Config

pub fn get_nfa(&self) -> &NFA

impl PikeVM

pub fn is_match<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I ) -> bool

pub fn find<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I ) -> Option<Match>

pub fn captures<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, caps: &mut Captures )

pub fn find_iter<'r, 'c, 'h, I: Into<Input<'h>>>( &'r self, cache: &'c mut Cache, input: I ) -> FindMatches<'r, 'c, 'h> ⓘ

pub fn captures_iter<'r, 'c, 'h, I: Into<Input<'h>>>( &'r self, cache: &'c mut Cache, input: I ) -> CapturesMatches<'r, 'c, 'h> ⓘ

impl PikeVM

pub fn search(&self, cache: &mut Cache, input: &Input<'_>, caps: &mut Captures)

pub fn search_slots( &self, cache: &mut Cache, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>] ) -> Option<PatternID>

pub fn which_overlapping_matches( &self, cache: &mut Cache, input: &Input<'_>, patset: &mut PatternSet )

Trait Implementations§

impl Clone for PikeVM

fn clone(&self) -> PikeVM

fn clone_from(&mut self, source: &Self)

impl Debug for PikeVM

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl RefUnwindSafe for PikeVM

impl Send for PikeVM

impl Sync for PikeVM

impl Unpin for PikeVM

impl UnwindSafe for PikeVM

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>