Python matching: Difference between revisions
From wikinotes
(No difference)
|
Latest revision as of 15:52, 26 October 2020
string matching
fnmatch
wildcard, simple-character-range matching.
import fnmatch items = ['test_a', 'test_b', 'a', 'b'] fnmatch.fnmatch(items, 'test_*') >>> ['test_a', 'test_b']character-range matching
import fnmatch items = ['test1', 'test2', 'test10', 'test15'] fnmatch.fnmatch(items, 'test[0-9][0-9]') >>> ['test10', 'test15'] fnmatch.fnmatch(items, 't[a-z]st*') >>> ['test1', 'test2', 'test10', 'test15']regex
Regex is kind of a big subject, let's isolate the syntax from the methods.
methods
re.match('[a-z]+', string) # match anchored to start of string (even without '^') re.search('[a-z]+', string) # match anywhere in the string re.findall('[a-z+]', string) # return a list of all matches. if regex-groups are defined, # it will be a list of tuples for each group re.finditer('[a-z]+', string) # iterate over match objects for each discovered match re.sub('[a-z]+', 'AAA', string) # substitutions based on matchmethod args
See https://docs.python.org/3.3/library/re.html
re.search( '[a-z]+', string, re.IGNORECASE ) # ignore case re.search( '[a-z]+', string, re.MULTILINE ) # multiline matchingsyntax
Honestly, read the official docs. https://docs.python.org/3.3/library/re.html
characters
* # none or more ? # none or one + # one or more . # repeat last pattern [ ... ] # character range ( ... ) # patterns within brackets are a group | # the OR operator (?P<quote>['"]) # named group {12,} # repeat matched group/range 12 times # ... and lots more ...range
# ranges are defined between '[' and ']' '[a-zA-Z]' # range can contain letters '[0-9]' # range can contain numbers '[_ \t]' # range can contain individual characters '[\\\-]' # escape special range characters '[a-zA-Z{}]'.format(re.escape('-')) # you can also do something like thisgroups
'(apple|orange|banana)' # group matches word apple, orange or bananalookahead/lookbehind
(?!asimov) # (negative lookahead) - match if next word is not 'asimov' (?<!isaac) # (negative lookbehind) - match if previous word is not 'asimov' (?=asimov) # (lookahead) - match if next word is 'asimov' # (but 'asimov' consumed by match) (?<=isaac) # (lookbehind) - match if previous word is 'asimov' # (but 'asimov' consumed by match)reuse match-group in re.sub()
\1 # expands 1st defined regex-group \2 # expands to 2nd defined regex-group # ... etc ...
file matching
glob
glob uses the same match patterns as fnmatch. It operates/returns on absolute-paths.
glob.glob('/path/*/file.txt') >>> ['/path/a/file.txt', '/path/z/file.txt']