pyspark.pandas.DataFrame.to_records¶
-
DataFrame.
to_records
(index: bool = True, column_dtypes: Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype, Dict[Union[Any, Tuple[Any, …]], Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype]], None] = None, index_dtypes: Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype, Dict[Union[Any, Tuple[Any, …]], Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype]], None] = None) → numpy.recarray¶ Convert DataFrame to a NumPy record array.
Index will be included as the first field of the record array if requested.
Note
This method should only be used if the resulting NumPy ndarray is expected to be small, as all the data is loaded into the driver’s memory.
- Parameters
- indexbool, default True
Include index in resulting record array, stored in ‘index’ field or using the index label, if set.
- column_dtypesstr, type, dict, default None
If a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types.
- index_dtypesstr, type, dict, default None
If a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types. This mapping is applied only if index=True.
- Returns
- numpy.recarray
NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries.
See also
DataFrame.from_records
Convert structured or record ndarray to DataFrame.
numpy.recarray
An ndarray that allows field access using attributes, analogous to typed columns in a spreadsheet.
Examples
>>> df = ps.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]}, ... index=['a', 'b']) >>> df A B a 1 0.50 b 2 0.75
>>> df.to_records() rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)], dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])
The index can be excluded from the record array:
>>> df.to_records(index=False) rec.array([(1, 0.5 ), (2, 0.75)], dtype=[('A', '<i8'), ('B', '<f8')])
Specification of dtype for columns is new in pandas 0.24.0. Data types can be specified for the columns:
>>> df.to_records(column_dtypes={"A": "int32"}) rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)], dtype=[('index', 'O'), ('A', '<i4'), ('B', '<f8')])
Specification of dtype for index is new in pandas 0.24.0. Data types can also be specified for the index:
>>> df.to_records(index_dtypes="<S2") rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)], dtype=[('index', 'S2'), ('A', '<i8'), ('B', '<f8')])