pyspark.pandas.to_numeric

pyspark.pandas.to_numeric(arg, errors='raise')

Convert argument to a numeric type.

Parameters
argscalar, list, tuple, 1-d array, or Series

Argument to be converted.

errors{‘raise’, ‘coerce’}, default ‘raise’
  • If ‘coerce’, then invalid parsing will be set as NaN.

  • If ‘raise’, then invalid parsing will raise an exception.

  • If ‘ignore’, then invalid parsing will return the input.

Note

‘ignore’ doesn’t work yet when arg is pandas-on-Spark Series.

Returns
retnumeric if parsing succeeded.

See also

DataFrame.astype

Cast argument to a specified dtype.

to_datetime

Convert argument to datetime.

to_timedelta

Convert argument to timedelta.

numpy.ndarray.astype

Cast a numpy array to a specified type.

Examples

>>> psser = ps.Series(['1.0', '2', '-3'])
>>> psser
0    1.0
1      2
2     -3
dtype: object
>>> ps.to_numeric(psser)
0    1.0
1    2.0
2   -3.0
dtype: float32

If given Series contains invalid value to cast float, just cast it to np.nan when errors is set to “coerce”.

>>> psser = ps.Series(['apple', '1.0', '2', '-3'])
>>> psser
0    apple
1      1.0
2        2
3       -3
dtype: object
>>> ps.to_numeric(psser, errors="coerce")
0    NaN
1    1.0
2    2.0
3   -3.0
dtype: float32

Also support for list, tuple, np.array, or a scalar

>>> ps.to_numeric(['1.0', '2', '-3'])
array([ 1.,  2., -3.])
>>> ps.to_numeric(('1.0', '2', '-3'))
array([ 1.,  2., -3.])
>>> ps.to_numeric(np.array(['1.0', '2', '-3']))
array([ 1.,  2., -3.])
>>> ps.to_numeric('1.0')
1.0