pyspark.pandas.Series.nlargest¶

Series.nlargest(n: int = 5) → pyspark.pandas.series.Series¶

Return the largest n elements.

Parameters

nint, default 5

Returns

Series: The n largest values in the Series, sorted in decreasing order.

See also

Series.nsmallest: Get the n smallest elements.
Series.sort_values: Sort Series by values.
Series.head: Return the first n rows.

Notes

Faster than .sort_values(ascending=False).head(n) for small n relative to the size of the Series object.

In pandas-on-Spark, thanks to Spark’s lazy execution and query optimizer, the two would have same performance.

Examples

>>> data = [1, 2, 3, 4, np.nan ,6, 7, 8]
>>> s = ps.Series(data)
>>> s
0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
5    6.0
6    7.0
7    8.0
dtype: float64

The n largest elements where n=5 by default.

>>> s.nlargest()
  8.0
  7.0
  6.0
  4.0
  3.0
dtype: float64

>>> s.nlargest(n=3)
7    8.0
6    7.0
5    6.0
dtype: float64

pyspark.pandas.Series.mode

pyspark.pandas.Series.nsmallest