Character groups

A group of characters, enclosed in brackets ( and ), forms one unit. This is especially useful when repeaters are used , or for backreferences.

Character group
A character group forms one unit. The group (abc) matches the exact substring abc and can be backreferenced by its group number.

 
Please do not confuse character groups with classes! This overview might help:

  • abc     : matches the exact substring “abc”.
  • (abc) : matches the exact substring “abc”, and can be backreferenced.
  • [abc] : matches character a, b or c.

 

Group numbering

Group numbers increment from left to right in the regex. Even nested groups get a number in this way:

    >>> import re
    >>> p = re.compile(r"(a(b)c)d")
    >>> m = p.match("abcdefghijklmn")
    >>> 
    >>> m.group(0)      <- Group 0 is by default the whole match.
    "abcd"
    >>> m.group(1)      <- The group belonging to the first ( encountered.
    "abc"
    >>> m.group(2)      <- The group belonging to the second ( encountered.
    "b"
    >>> m.groups()      <- Don't forget the s.
    ("abc", "b")
    >>>  

 

Backreferences

A backreference in a regex is simply \m with m being the group number. This is the definition:

Backreference
The backreference \1 in a regex succeeds if the exact contents of group 1 can be found at the current position.

 
Of course, this definition is the same for group 2, group 3, …
Backreferences can be used to detect double words in a string. Consider the sample code below. Watch out, there is one symbol \b that we didn’t cover yet – it represents a “word boundary”.

    >>> import re
    >>> p = re.compile(r"(\b\w+)\s+\1")
    >>> m_all = p.finditer("This regex can detect detect double words.")
    >>> for m in m_all:
    ...     print(m)
    ...
    <_sre.SRE_Match object; span=(15, 28), match='detect detect'>
    >>>  

 

Non-capturing groups

The Perl language added some extra features to the regex language. Non-capturing groups is one of them. The Perl developers decided to start their extensions with (?, because this is a syntax error in usual regexes. As such, it wouldn’t create any conflicts such that backwards compatibility is ensured.
But let’s get back to the subject. A non-capturing group behaves in all aspects as regular groups – but they cannot be backreferenced. They don’t get a group number assigned. A non-capturing group is put between (?: and ).

Non-capturing groups
A non-capturing group is put between (?: and ). The non-capturing group behaves like other groups in all aspects, but it doesn’t get a group number, and cannot be backreferenced.

 

Named groups

The Python language added its own extensions to the Perl extensions. Python extensions start with the (?P suffix. Named groups is an example
A named group can be backreferenced by its name OR its number. A named group is put between (?P<name> and ). The backrefence to a named group is (?P=name) or through its number like \1, \2, …

Named groups
A named group is put between (?P<name> and ). The backrefence to a named group is (?P=name) or through its number like \1, \2, …