pyspark.sql.functions.collect_list¶

pyspark.sql.functions.collect_list(col: ColumnOrName) → pyspark.sql.column.Column¶

Aggregate function: returns a list of objects with duplicates.

Notes

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Examples

>>> df2 = spark.createDataFrame([(2,), (5,), (5,)], ('age',))
>>> df2.agg(collect_list('age')).collect()
[Row(collect_list(age)=[2, 5, 5])]

pyspark.sql.functions.avg

pyspark.sql.functions.collect_set