pyspark.pandas.groupby.GroupBy.cumcount

GroupBy.cumcount(ascending: bool = True) → pyspark.pandas.series.Series

Number each item in each group from 0 to the length of that group - 1.

Essentially this is equivalent to

self.apply(lambda x: pd.Series(np.arange(len(x)), x.index))
Parameters
ascendingbool, default True

If False, number in reverse, from length of group - 1 to 0.

Returns
Series

Sequence number of each element within each group.

Examples

>>> df = ps.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']],
...                   columns=['A'])
>>> df
   A
0  a
1  a
2  a
3  b
4  b
5  a
>>> df.groupby('A').cumcount().sort_index()
0    0
1    1
2    2
3    0
4    1
5    3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False).sort_index()
0    3
1    2
2    1
3    1
4    0
5    0
dtype: int64