pyspark.pandas.DataFrame.unstack¶
-
DataFrame.
unstack
() → Union[DataFrame, Series]¶ Pivot the (necessarily hierarchical) index labels.
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
If the index is not a MultiIndex, the output will be a Series.
Note
If the index is a MultiIndex, the output DataFrame could be very wide, and it could cause a serious performance degradation since Spark partitions it row based.
- Returns
- Series or DataFrame
See also
DataFrame.pivot
Pivot a table based on column values.
DataFrame.stack
Pivot a level of the column labels (inverse operation from unstack).
Examples
>>> df = ps.DataFrame({"A": {"0": "a", "1": "b", "2": "c"}, ... "B": {"0": "1", "1": "3", "2": "5"}, ... "C": {"0": "2", "1": "4", "2": "6"}}, ... columns=["A", "B", "C"]) >>> df A B C 0 a 1 2 1 b 3 4 2 c 5 6
>>> df.unstack().sort_index() A 0 a 1 b 2 c B 0 1 1 3 2 5 C 0 2 1 4 2 6 dtype: object
>>> df.columns = pd.MultiIndex.from_tuples([('X', 'A'), ('X', 'B'), ('Y', 'C')]) >>> df.unstack().sort_index() X A 0 a 1 b 2 c B 0 1 1 3 2 5 Y C 0 2 1 4 2 6 dtype: object
For MultiIndex case:
>>> df = ps.DataFrame({"A": ["a", "b", "c"], ... "B": [1, 3, 5], ... "C": [2, 4, 6]}, ... columns=["A", "B", "C"]) >>> df = df.set_index('A', append=True) >>> df B C A 0 a 1 2 1 b 3 4 2 c 5 6 >>> df.unstack().sort_index() B C A a b c a b c 0 1.0 NaN NaN 2.0 NaN NaN 1 NaN 3.0 NaN NaN 4.0 NaN 2 NaN NaN 5.0 NaN NaN 6.0