pyspark.sql.functions.max_by

pyspark.sql.functions.max_by(col: ColumnOrName, ord: ColumnOrName) → pyspark.sql.column.Column

Returns the value associated with the maximum value of ord.

Parameters
colColumn or str

target column that the value will be returned

ordColumn or str

column to be maximized

Returns
Column

value associated with the maximum value of ord.

Examples

>>> df = spark.createDataFrame([
...     ("Java", 2012, 20000), ("dotNET", 2012, 5000),
...     ("dotNET", 2013, 48000), ("Java", 2013, 30000)],
...     schema=("course", "year", "earnings"))
>>> df.groupby("course").agg(max_by("year", "earnings")).show()
+------+----------------------+
|course|max_by(year, earnings)|
+------+----------------------+
|  Java|                  2013|
|dotNET|                  2013|
+------+----------------------+