pyspark.sql.functions.split

pyspark.sql.functions.split(str: ColumnOrName, pattern: str, limit: int = - 1) → pyspark.sql.column.Column

Splits str around matches of the given pattern.

Parameters
strColumn or str

a string expression to split

patternstr

a string representing a regular expression. The regex string should be a Java regular expression.

limitint, optional

an integer which controls the number of times pattern is applied.

  • limit > 0: The resulting array’s length will not be more than limit, and the

    resulting array’s last entry will contain all input beyond the last matched pattern.

  • limit <= 0: pattern will be applied as many times as possible, and the resulting

    array can be of any size.

    split now takes an optional limit field. If not provided, default limit value is -1.

Examples

>>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
>>> df.select(split(df.s, '[ABC]', 2).alias('s')).collect()
[Row(s=['one', 'twoBthreeC'])]
>>> df.select(split(df.s, '[ABC]', -1).alias('s')).collect()
[Row(s=['one', 'two', 'three', ''])]