pyspark.sql.functions.split¶
-
pyspark.sql.functions.
split
(str: ColumnOrName, pattern: str, limit: int = - 1) → pyspark.sql.column.Column¶ Splits str around matches of the given pattern.
- Parameters
- str
Column
or str a string expression to split
- patternstr
a string representing a regular expression. The regex string should be a Java regular expression.
- limitint, optional
an integer which controls the number of times pattern is applied.
limit > 0
: The resulting array’s length will not be more than limit, and theresulting array’s last entry will contain all input beyond the last matched pattern.
limit <= 0
: pattern will be applied as many times as possible, and the resultingarray can be of any size.
split now takes an optional limit field. If not provided, default limit value is -1.
- str
Examples
>>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',]) >>> df.select(split(df.s, '[ABC]', 2).alias('s')).collect() [Row(s=['one', 'twoBthreeC'])] >>> df.select(split(df.s, '[ABC]', -1).alias('s')).collect() [Row(s=['one', 'two', 'three', ''])]