pyspark.sql.functions.array_distinct¶
-
pyspark.sql.functions.
array_distinct
(col: ColumnOrName) → pyspark.sql.column.Column¶ Collection function: removes duplicate values from the array.
- Parameters
- col
Column
or str name of column or expression
- col
Examples
>>> df = spark.createDataFrame([([1, 2, 3, 2],), ([4, 5, 5, 4],)], ['data']) >>> df.select(array_distinct(df.data)).collect() [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])]