pyspark.pandas.DataFrame.transpose¶

DataFrame.transpose() → pyspark.pandas.frame.DataFrame¶

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Note

This method is based on an expensive operation due to the nature of big data. Internally it needs to generate each row for each value, and then group twice - it is a huge operation. To prevent misusage, this method has the ‘compute.max_rows’ default limit of input length, and raises a ValueError.

>>> from pyspark.pandas.config import option_context
>>> with option_context('compute.max_rows', 1000):  
...     ps.DataFrame({'a': range(1001)}).transpose()
Traceback (most recent call last):
  ...
ValueError: Current DataFrame has more then the given limit 1000 rows.
Please set 'compute.max_rows' by using 'pyspark.pandas.config.set_option'
to retrieve to retrieve more than 1000 rows. Note that, before changing the
'compute.max_rows', this operation is considerably expensive.

Returns

DataFrame: The transposed DataFrame.

Notes

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the coerced dtype. For instance, if int and float have to be placed in same column, it becomes float. If type coercion is not possible, it fails.

Also, note that the values in index should be unique because they become unique column names.

In addition, if Spark 2.3 is used, the types should always be exactly same.

Examples

Square DataFrame with homogeneous dtype

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = ps.DataFrame(data=d1, columns=['col1', 'col2'])
>>> df1
   col1  col2
0     1     3
1     2     4

>>> df1_transposed = df1.T.sort_index()  
>>> df1_transposed  
      0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

>>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes  
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

>>> d2 = {'score': [9.5, 8],
...       'kids': [0, 0],
...       'age': [12, 22]}
>>> df2 = ps.DataFrame(data=d2, columns=['score', 'kids', 'age'])
>>> df2
   score  kids  age
0    9.5     0   12
1    8.0     0   22

>>> df2_transposed = df2.T.sort_index()  
>>> df2_transposed  
          0     1
age    12.0  22.0
kids    0.0   0.0
score   9.5   8.0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the coerced dtype:

>>> df2.dtypes
score    float64
kids       int64
age        int64
dtype: object

>>> df2_transposed.dtypes  
0    float64
1    float64
dtype: object

pyspark.pandas.DataFrame.T

pyspark.pandas.DataFrame.reindex