pyspark.pandas.groupby.GroupBy.cumcount¶
-
GroupBy.
cumcount
(ascending: bool = True) → pyspark.pandas.series.Series¶ Number each item in each group from 0 to the length of that group - 1.
Essentially this is equivalent to
self.apply(lambda x: pd.Series(np.arange(len(x)), x.index))
- Parameters
- ascendingbool, default True
If False, number in reverse, from length of group - 1 to 0.
- Returns
- Series
Sequence number of each element within each group.
Examples
>>> df = ps.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']], ... columns=['A']) >>> df A 0 a 1 a 2 a 3 b 4 b 5 a >>> df.groupby('A').cumcount().sort_index() 0 0 1 1 2 2 3 0 4 1 5 3 dtype: int64 >>> df.groupby('A').cumcount(ascending=False).sort_index() 0 3 1 2 2 1 3 1 4 0 5 0 dtype: int64