pyspark.pandas.groupby.GroupBy.first

GroupBy.first(numeric_only: Optional[bool] = False) → FrameLike

Compute first of group values.

Parameters
numeric_onlybool, default False

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

Examples

>>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, True],
...                    "C": [3, 3, 4, 4], "D": ["a", "b", "b", "a"]})
>>> df
   A      B  C  D
0  1   True  3  a
1  2  False  3  b
2  1  False  4  b
3  2   True  4  a
>>> df.groupby("A").first().sort_index()
       B  C  D
A
1   True  3  a
2  False  3  b

Include only float, int, boolean columns when set numeric_only True.

>>> df.groupby("A").first(numeric_only=True).sort_index()
       B  C
A
1   True  3
2  False  3