pyspark.pandas.DataFrame.quantile¶

DataFrame.quantile(q: Union[float, Iterable[float]] = 0.5, axis: Union[int, str] = 0, numeric_only: bool = True, accuracy: int = 10000) → Union[DataFrame, Series]¶

Return value at the given quantile.

Note

Unlike pandas’, the quantile in pandas-on-Spark is an approximated quantile based upon approximate percentile computation because computing quantile across a large dataset is extremely expensive.

Parameters

qfloat or array-like, default 0.5 (50% quantile): 0 <= q <= 1, the quantile(s) to compute.
axisint or str, default 0 or ‘index’: Can only be set to 0 at the moment.
numeric_onlybool, default True: If False, the quantile of datetime and timedelta data will be computed as well. Can only be set to True at the moment.
accuracyint, optional: Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy.

Returns

Series or DataFrame: If q is an array, a DataFrame will be returned where the index is q, the columns are the columns of self, and the values are the quantiles. If q is a float, a Series will be returned where the index is the columns of self and the values are the quantiles.

Examples

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]})
>>> psdf
   a  b
0  1  6
1  2  7
2  3  8
3  4  9
4  5  0

>>> psdf.quantile(.5)
a    3.0
b    7.0
Name: 0.5, dtype: float64

>>> psdf.quantile([.25, .5, .75])
        a    b
0.25  2.0  6.0
0.50  3.0  7.0
0.75  4.0  8.0

pyspark.pandas.DataFrame.product

pyspark.pandas.DataFrame.nunique