pyspark.sql.functions.first¶
-
pyspark.sql.functions.
first
(col: ColumnOrName, ignorenulls: bool = False) → pyspark.sql.column.Column¶ Aggregate function: returns the first value in a group.
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
Notes
The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Examples
>>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age")) >>> df.groupby("name").agg(first("age")).orderBy("name").show() +-----+----------+ | name|first(age)| +-----+----------+ |Alice| 2| | Bob| 5| +-----+----------+