pyspark.pandas.DataFrame.to_records

DataFrame.to_records(index: bool = True, column_dtypes: Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype, Dict[Union[Any, Tuple[Any, …]], Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype]], None] = None, index_dtypes: Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype, Dict[Union[Any, Tuple[Any, …]], Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype]], None] = None) → numpy.recarray

Convert DataFrame to a NumPy record array.

Index will be included as the first field of the record array if requested.

Note

This method should only be used if the resulting NumPy ndarray is expected to be small, as all the data is loaded into the driver’s memory.

Parameters
indexbool, default True

Include index in resulting record array, stored in ‘index’ field or using the index label, if set.

column_dtypesstr, type, dict, default None

If a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types.

index_dtypesstr, type, dict, default None

If a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types. This mapping is applied only if index=True.

Returns
numpy.recarray

NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries.

See also

DataFrame.from_records

Convert structured or record ndarray to DataFrame.

numpy.recarray

An ndarray that allows field access using attributes, analogous to typed columns in a spreadsheet.

Examples

>>> df = ps.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
...                   index=['a', 'b'])
>>> df
   A     B
a  1  0.50
b  2  0.75
>>> df.to_records() 
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:

>>> df.to_records(index=False) 
rec.array([(1, 0.5 ), (2, 0.75)],
          dtype=[('A', '<i8'), ('B', '<f8')])

Specification of dtype for columns is new in pandas 0.24.0. Data types can be specified for the columns:

>>> df.to_records(column_dtypes={"A": "int32"}) 
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i4'), ('B', '<f8')])

Specification of dtype for index is new in pandas 0.24.0. Data types can also be specified for the index:

>>> df.to_records(index_dtypes="<S2") 
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('index', 'S2'), ('A', '<i8'), ('B', '<f8')])