pyspark.pandas.DataFrame.to_dict¶
-
DataFrame.
to_dict
(orient: str = 'dict', into: Type = <class 'dict'>) → Union[List, collections.abc.Mapping]¶ Convert the DataFrame to a dictionary.
The type of the key-value pairs can be customized with the parameters (see below).
Note
This method should only be used if the resulting pandas DataFrame is expected to be small, as all the data is loaded into the driver’s memory.
- Parameters
- orientstr {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}
Determines the type of the values of the dictionary.
‘dict’ (default) : dict like {column -> {index -> value}}
‘list’ : dict like {column -> [values]}
‘series’ : dict like {column -> Series(values)}
‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
‘records’ : list like [{column -> value}, … , {column -> value}]
‘index’ : dict like {index -> {column -> value}}
Abbreviations are allowed. s indicates series and sp indicates split.
- intoclass, default dict
The collections.abc.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.
- Returns
- dict, list or collections.abc.Mapping
Return a collections.abc.Mapping object representing the DataFrame. The resulting transformation depends on the orient parameter.
Examples
>>> df = ps.DataFrame({'col1': [1, 2], ... 'col2': [0.5, 0.75]}, ... index=['row1', 'row2'], ... columns=['col1', 'col2']) >>> df col1 col2 row1 1 0.50 row2 2 0.75
>>> df_dict = df.to_dict() >>> sorted([(key, sorted(values.items())) for key, values in df_dict.items()]) [('col1', [('row1', 1), ('row2', 2)]), ('col2', [('row1', 0.5), ('row2', 0.75)])]
You can specify the return orientation.
>>> df_dict = df.to_dict('series') >>> sorted(df_dict.items()) [('col1', row1 1 row2 2 Name: col1, dtype: int64), ('col2', row1 0.50 row2 0.75 Name: col2, dtype: float64)]
>>> df_dict = df.to_dict('split') >>> sorted(df_dict.items()) [('columns', ['col1', 'col2']), ('data', [[1..., 0.75]]), ('index', ['row1', 'row2'])]
>>> df_dict = df.to_dict('records') >>> [sorted(values.items()) for values in df_dict] [[('col1', 1...), ('col2', 0.5)], [('col1', 2...), ('col2', 0.75)]]
>>> df_dict = df.to_dict('index') >>> sorted([(key, sorted(values.items())) for key, values in df_dict.items()]) [('row1', [('col1', 1), ('col2', 0.5)]), ('row2', [('col1', 2), ('col2', 0.75)])]
You can also specify the mapping type.
>>> from collections import OrderedDict, defaultdict >>> df.to_dict(into=OrderedDict) OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
If you want a defaultdict, you need to initialize it:
>>> dd = defaultdict(list) >>> df.to_dict('records', into=dd) [defaultdict(<class 'list'>, {'col..., 'col...}), defaultdict(<class 'list'>, {'col..., 'col...})]