pyspark.sql.functions.grouping_id

pyspark.sql.functions.grouping_id(*cols: ColumnOrName) → pyspark.sql.column.Column

Aggregate function: returns the level of grouping, equals to

(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + … + grouping(cn)

Notes

The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

Examples

>>> df.cube("name").agg(grouping_id(), sum("age")).orderBy("name").show()
+-----+-------------+--------+
| name|grouping_id()|sum(age)|
+-----+-------------+--------+
| null|            1|       7|
|Alice|            0|       2|
|  Bob|            0|       5|
+-----+-------------+--------+