Back References - Cisco CS-MARS-20-K9 - Security MARS 20 User Manual

Security mars local controller
Table of Contents

Advertisement

Appendix B
Regular Expression Reference
This kind of parenthesis "locks up" the part of the pattern it contains once it has matched, and a failure
further into the pattern is prevented from backtracking into it. Backtracking past it to previous items,
however, works as normal.
An alternative description is that a subpattern of this type matches the string of characters that an
identical standalone pattern would match, if anchored at the current point in the subject string.
Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as the above example can
be thought of as a maximizing repeat that must swallow everything it can. So, while both \d+ and \d+?
are prepared to adjust the number of digits they match in order to make the rest of the pattern match,
(?>\d+) can only match an entire sequence of digits.
Atomic groups in general can of course contain arbitrarily complicated subpatterns, and can be nested.
However, when the subpattern for an atomic group is just a single repeated item, as in the example above,
a simpler notation, called a "possessive quantifier" can be used. This consists of an additional + character
following a quantifier. Using this notation, the previous example can be rewritten as
Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY option is ignored. They
are a convenient notation for the simpler forms of atomic group. However, there is no difference in the
meaning or processing of a possessive quantifier and the equivalent atomic group.
The possessive quantifier syntax is an extension to the Perl syntax. It originates in Sun's Java package.
When a pattern contains an unlimited repeat inside a subpattern that can itself be repeated an unlimited
number of times, the use of an atomic group is the only way to avoid some failing matches taking a very
long time indeed. The pattern
matches an unlimited number of substrings that either consist of non-digits, or digits enclosed in <>,
followed by either ! or ?. When it matches, it runs quickly. However, if it is applied to
it takes a long time before reporting failure. This is because the string can be divided between the internal
\D+ repeat and the external * repeat in a large number of ways, and all have to be tried. (The example
uses [!?] rather than a single character at the end, because both PCRE and Perl have an optimization that
allows for fast failure when a single character is used. They remember the last single character that is
required for a match, and fail early if it is not present in the string.) If the pattern is changed so that it
uses an atomic group, like this:
sequences of non-digits cannot be broken, and failure happens quickly.

Back References

Outside a character class, a backslash followed by a digit greater than 0 (and possibly further digits) is
a back reference to a capturing subpattern earlier (that is, to its left) in the pattern, provided there have
been that many previous capturing left parentheses.
78-17020-01
(?>\d+)foo
\d++foo
(\D+|<\d+>)*[!?]
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
((?>\D+)|<\d+>)*[!?]
User Guide for Cisco Security MARS Local Controller
Back References
B-15

Advertisement

Table of Contents
loading

This manual is also suitable for:

Mars 20Mars 50Mars 100Mars 200

Table of Contents