pyspark.pandas.Series.transform¶

Series.transform(func: Union[Callable, List[Callable]], axis: Union[int, str] = 0, *args: Any, **kwargs: Any) → Union[pyspark.pandas.series.Series, pyspark.pandas.frame.DataFrame]¶

Call func producing the same type as self with transformed values and that has the same axis length as input.

Note

this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting.

To avoid this, specify return type in func, for instance, as below:

>>> def square(x) -> np.int32:
...     return x ** 2

pandas-on-Spark uses return type hint and does not try to infer the type.

Parameters

funcfunction or list: A function or a list of functions to use for transforming the data.
axisint, default 0 or ‘index’: Can only be set to 0 at the moment.
*args: Positional arguments to pass to func.
**kwargs: Keyword arguments to pass to func.

Returns

An instance of the same type with self that must have the same length as input.

See also

Series.aggregate: Only perform aggregating type operations.
Series.apply: Invoke function on Series.
DataFrame.transform: The equivalent function for DataFrame.

Examples

>>> s = ps.Series(range(3))
>>> s
0    0
1    1
2    2
dtype: int64

>>> def sqrt(x) -> float:
...     return np.sqrt(x)
>>> s.transform(sqrt)
0    0.000000
1    1.000000
2    1.414214
dtype: float64

Even though the resulting instance must have the same length as the input, it is possible to provide several input functions:

>>> def exp(x) -> float:
...     return np.exp(x)
>>> s.transform([sqrt, exp])
       sqrt       exp
0  0.000000  1.000000
1  1.000000  2.718282
2  1.414214  7.389056

You can omit the type hint and let pandas-on-Spark infer its type.

>>> s.transform([np.sqrt, np.exp])
       sqrt       exp
0  0.000000  1.000000
1  1.000000  2.718282
2  1.414214  7.389056

pyspark.pandas.Series.aggregate

pyspark.pandas.Series.map