Unicode Character Properties - Cisco CS-MARS-20-K9 - Security MARS 20 User Manual

Security mars local controller
Table of Contents

Advertisement

Appendix B
Regular Expression Reference
A "word" character is an underscore or any character less than 256 that is a letter or digit. The definition
of letters and digits is controlled by PCRE's low-valued character tables, and may vary if locale-specific
matching is taking place (see "Locale support" in the pcreapi page). For example, in the "fr_FR"
(French) locale, some character codes greater than 128 are used for accented letters, and these are
matched by \w.
In UTF-8 mode, characters with values greater than 128 never match \d, \s, or \w, and always match \D,
\S, and \W. This is true even when Unicode character property support is available.

Unicode Character Properties

When PCRE is built with Unicode character property support, three additional escape sequences to
match generic character types are available when UTF-8 mode is selected. They are:
The property names represented by xx above are limited to the Unicode general category properties. Each
character has exactly one such property, specified by a two-letter abbreviation. For compatibility with
Perl, negation can be specified by including a circumflex between the opening brace and the property
name. For example, \p{^Lu} is the same as \P{Lu}.
If only one letter is specified with \p or \P, it includes all the properties that start with that letter. In this
case, in the absence of negation, the curly brackets in the escape sequence are optional; these two
examples have the same effect:
The following property codes are supported:
78-17020-01
\p{xx}
a character with the xx property
\P{xx}
a character without the xx property
\X
an extended Unicode sequence
\p{L}
\pL
C
Other
Cc
Control
Cf
Format
Cn
Unassigned
Co
Private use
Cs
Surrogate
L
Letter
Ll
Lower case letter
Lm
Modifier letter
Lo
Other letter
Lt
Title case letter
Lu
Upper case letter
M
Mark
Mc
Spacing mark
Me
Enclosing mark
Mn
Non-spacing mark
N
Number
Nd
Decimal number
Nl
Letter number
No
Other number
P
Punctuation
Pc
Connector punctuation
User Guide for Cisco Security MARS Local Controller
Backslash
B-5

Advertisement

Table of Contents
loading

This manual is also suitable for:

Mars 20Mars 50Mars 100Mars 200

Table of Contents