pyspark.pandas.Series.str.findall

str.findall(pat: str, flags: int = 0) → pyspark.pandas.series.Series

Find all occurrences of pattern or regular expression in the Series.

Equivalent to applying re.findall() to all the elements in the Series.

Parameters
patstr

Pattern or regular expression.

flagsint, default 0 (no flags)

re module flags, e.g. re.IGNORECASE.

Returns
Series of object

All non-overlapping matches of pattern or regular expression in each string of this Series.

Examples

>>> s = ps.Series(['Lion', 'Monkey', 'Rabbit'])

The search for the pattern ‘Monkey’ returns one match:

>>> s.str.findall('Monkey')
0          []
1    [Monkey]
2          []
dtype: object

On the other hand, the search for the pattern ‘MONKEY’ doesn’t return any match:

>>> s.str.findall('MONKEY')
0    []
1    []
2    []
dtype: object

Flags can be added to the pattern or regular expression. For instance, to find the pattern ‘MONKEY’ ignoring the case:

>>> import re
>>> s.str.findall('MONKEY', flags=re.IGNORECASE)
0          []
1    [Monkey]
2          []
dtype: object

When the pattern matches more than one string in the Series, all matches are returned:

>>> s.str.findall('on')
0    [on]
1    [on]
2      []
dtype: object

Regular expressions are supported too. For instance, the search for all the strings ending with the word ‘on’ is shown next:

>>> s.str.findall('on$')
0    [on]
1      []
2      []
dtype: object

If the pattern is found more than once in the same string, then a list of multiple strings is returned:

>>> s.str.findall('b')
0        []
1        []
2    [b, b]
dtype: object