pyspark.pandas.DataFrame.values

property DataFrame.values

Return a Numpy representation of the DataFrame or the Series.

Warning

We recommend using DataFrame.to_numpy() or Series.to_numpy() instead.

Note

This method should only be used if the resulting NumPy ndarray is expected to be small, as all the data is loaded into the driver’s memory.

Returns
numpy.ndarray

Examples

A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

>>> df = ps.DataFrame({'age':    [ 3,  29],
...                    'height': [94, 170],
...                    'weight': [31, 115]})
>>> df
   age  height  weight
0    3      94      31
1   29     170     115
>>> df.dtypes
age       int64
height    int64
weight    int64
dtype: object
>>> df.values
array([[  3,  94,  31],
       [ 29, 170, 115]])

A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).

>>> df2 = ps.DataFrame([('parrot',   24.0, 'second'),
...                     ('lion',     80.5, 'first'),
...                     ('monkey', np.nan, None)],
...                   columns=('name', 'max_speed', 'rank'))
>>> df2.dtypes
name          object
max_speed    float64
rank          object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
       ['lion', 80.5, 'first'],
       ['monkey', nan, None]], dtype=object)

For Series,

>>> ps.Series([1, 2, 3]).values
array([1, 2, 3])
>>> ps.Series(list('aabc')).values
array(['a', 'a', 'b', 'c'], dtype=object)