Regular Expressions Syntax Basics

Overview

Regular expressions in Abyss Web Server conform to the PCRE syntax (Perl Compatible Regular Expressions). This articles is a quick guide to understand the basics of regular expressions. For an extensive description of their syntax, refer to the PCREPATTERN section in http://pcre.org/pcre.txt.

Syntax basics

When matching a string (a sequence of characters) with a regular expression, the following rules apply:

  • . matches any character,
  • * repeats the previous match zero or more times,
  • + repeats the previous match one or more times,
  • ? repeats the previous match zero or one time at most,
  • {n,m} repeats the previous match n times at least and m times at most (n and m are positive integers),
  • {n} repeats the previous match exactly n times,
  • {n,} repeats the previous match n times at least,
  • {,m} repeats the previous match m times at most,
  • ^ is an anchor which matches with the beginning of a string,
  • $ is an anchor which matches with the end of a string,
  • [set] matches any character in the specified set,
  • [^set] matches any character not in the specified set,
  • \ suppresses the syntactic significance of a special character,
  • (expression) groups the characters between the parentheses into a single unit and captures a match for later use as a backreference ($1, ... , $9).

A set is made of characters or ranges. A range is formed by two characters with a - in the middle (as in 0-9 or a-z).

Preceding a special character with \ makes it loose its syntactic significance and match that character exactly. Outside a set, the special characters are:

()[]{}.*+?^$\

Inside a set, the special characters are:

[]\-^

Examples of regular expressions

abc
Any string containing the substring abc matches with this regular expression.
abcd*
Any string containing the substring abc followed by zero or more d characters matches with this regular expression.
abcd?
Any string containing the substring abc or abcd matches with this regular expression.
ab(cd)?
Any string containing the substring ab or abcd matches with this regular expression.
^/dir
Any string starting with the substring /dir matches with this regular expression.
\.exe$
Any string ending with the substring .exe matches with this regular expression. Note here that the dot character . has been escaped to remove its special meaning.
^/dir/.*\.exe$
Any string beginning with /dir and ending with .exe matches with this regular expression.
^/dir/[^./]+\.exe$
Any string starting with /dir followed by 1 or more characters except . and /, and followed by .exe matches with this regular expression.

Examples of backreferences

  • /dir/test.exe matches with ^/dir/([^./]+)\.exe$. This regular expression has a single group ([^./]+) which defines the backreference $1. In this case, $1's value is test.
  • /dir/test.exec matches with /dir/([^./]+)\.(exe.*)$. This regular expression has two groups: ([^./]+) which defines the backreference $1, and (exe.*) which defines the backreference $2. In this case, $1's value is test and $2's value is exec.

See also