pyspark.pandas.DataFrame.to_dict

DataFrame.to_dict(orient: str = 'dict', into: Type = <class 'dict'>) → Union[List, collections.abc.Mapping]

Convert the DataFrame to a dictionary.

The type of the key-value pairs can be customized with the parameters (see below).

Note

This method should only be used if the resulting pandas DataFrame is expected to be small, as all the data is loaded into the driver’s memory.

Parameters
orientstr {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}

Determines the type of the values of the dictionary.

  • ‘dict’ (default) : dict like {column -> {index -> value}}

  • ‘list’ : dict like {column -> [values]}

  • ‘series’ : dict like {column -> Series(values)}

  • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

  • ‘records’ : list like [{column -> value}, … , {column -> value}]

  • ‘index’ : dict like {index -> {column -> value}}

Abbreviations are allowed. s indicates series and sp indicates split.

intoclass, default dict

The collections.abc.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.

Returns
dict, list or collections.abc.Mapping

Return a collections.abc.Mapping object representing the DataFrame. The resulting transformation depends on the orient parameter.

Examples

>>> df = ps.DataFrame({'col1': [1, 2],
...                    'col2': [0.5, 0.75]},
...                   index=['row1', 'row2'],
...                   columns=['col1', 'col2'])
>>> df
      col1  col2
row1     1  0.50
row2     2  0.75
>>> df_dict = df.to_dict()
>>> sorted([(key, sorted(values.items())) for key, values in df_dict.items()])
[('col1', [('row1', 1), ('row2', 2)]), ('col2', [('row1', 0.5), ('row2', 0.75)])]

You can specify the return orientation.

>>> df_dict = df.to_dict('series')
>>> sorted(df_dict.items())
[('col1', row1    1
row2    2
Name: col1, dtype: int64), ('col2', row1    0.50
row2    0.75
Name: col2, dtype: float64)]
>>> df_dict = df.to_dict('split')
>>> sorted(df_dict.items())  
[('columns', ['col1', 'col2']), ('data', [[1..., 0.75]]), ('index', ['row1', 'row2'])]
>>> df_dict = df.to_dict('records')
>>> [sorted(values.items()) for values in df_dict]  
[[('col1', 1...), ('col2', 0.5)], [('col1', 2...), ('col2', 0.75)]]
>>> df_dict = df.to_dict('index')
>>> sorted([(key, sorted(values.items())) for key, values in df_dict.items()])
[('row1', [('col1', 1), ('col2', 0.5)]), ('row2', [('col1', 2), ('col2', 0.75)])]

You can also specify the mapping type.

>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])

If you want a defaultdict, you need to initialize it:

>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)  
[defaultdict(<class 'list'>, {'col..., 'col...}), defaultdict(<class 'list'>, {'col..., 'col...})]