Python matching

From wikinotes

string matching

fnmatch

wildcard, simple-character-range matching.

import fnmatch

items = ['test_a', 'test_b', 'a', 'b']

fnmatch.fnmatch(items, 'test_*')
>>> ['test_a', 'test_b']

character-range matching

import fnmatch

items = ['test1', 'test2', 'test10', 'test15']

fnmatch.fnmatch(items, 'test[0-9][0-9]')
>>> ['test10', 'test15']

fnmatch.fnmatch(items, 't[a-z]st*')
>>> ['test1', 'test2', 'test10', 'test15']

regex

Regex is kind of a big subject, let's isolate the syntax from the methods.

methods

re.match('[a-z]+', string)       # match anchored to start of string (even without '^')
re.search('[a-z]+', string)      # match anywhere in the string
re.findall('[a-z+]', string)     # return a list of all matches. if regex-groups are defined, 
                                 # it will be a list of tuples for each group

re.finditer('[a-z]+', string)    # iterate over match objects for each discovered match
re.sub('[a-z]+', 'AAA', string)  # substitutions based on match

method args

See https://docs.python.org/3.3/library/re.html

re.search( '[a-z]+', string, re.IGNORECASE )   # ignore case
re.search( '[a-z]+', string, re.MULTILINE )    # multiline matching

syntax

Honestly, read the official docs. https://docs.python.org/3.3/library/re.html

characters

*                 # none or more
?                 # none or one
+                 # one or more
.                 # repeat last pattern
[ ... ]           # character range
( ... )           # patterns within brackets are a group
|                 # the OR operator
(?P<quote>['"])   # named group
{12,}             # repeat matched group/range 12 times

# ... and lots more ...

range

# ranges are defined between  '[' and ']'
'[a-zA-Z]'                           # range can contain letters
'[0-9]'                              # range can contain numbers
'[_ \t]'                             # range can contain individual characters
'[\\\-]'                             # escape special range characters
'[a-zA-Z{}]'.format(re.escape('-'))  # you can also do something like this

groups

'(apple|orange|banana)'              # group matches word apple, orange or banana

lookahead/lookbehind

(?!asimov)                           # (negative lookahead)  - match if next word is not 'asimov'
(?<!isaac)                           # (negative lookbehind) - match if previous word is not 'asimov'

(?=asimov)                           # (lookahead)           - match if next word is 'asimov' 
                                     #                         (but 'asimov' consumed by match)

(?<=isaac)                           # (lookbehind)          - match if previous word is 'asimov' 
                                     #                        (but 'asimov' consumed by match)

reuse match-group in re.sub()

\1                                   # expands 1st defined regex-group
\2                                   # expands to 2nd defined regex-group

# ... etc ...


file matching

glob

glob uses the same match patterns as fnmatch. It operates/returns on absolute-paths.

glob.glob('/path/*/file.txt')
>>> ['/path/a/file.txt', '/path/z/file.txt']