pyspark.pandas.DataFrame.to_numpy¶

DataFrame.to_numpy() → numpy.ndarray¶

A NumPy ndarray representing the values in this DataFrame or Series.

Note

This method should only be used if the resulting NumPy ndarray is expected to be small, as all the data is loaded into the driver’s memory.

Returns

numpy.ndarray

Examples

>>> ps.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
       [2, 4]])

With heterogeneous data, the lowest common type will have to be used.

>>> ps.DataFrame({"A": [1, 2], "B": [3.0, 4.5]}).to_numpy()
array([[1. , 3. ],
       [2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will have object dtype.

>>> df = ps.DataFrame({"A": [1, 2], "B": [3.0, 4.5], "C": pd.date_range('2000', periods=2)})
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)

For Series,

>>> ps.Series(['a', 'b', 'a']).to_numpy()
array(['a', 'b', 'a'], dtype=object)

pyspark.pandas.DataFrame.to_pandas

pyspark.pandas.DataFrame.to_spark