pyspark.pandas.groupby.GroupBy.ewm

GroupBy.ewm(com: Optional[float] = None, span: Optional[float] = None, halflife: Optional[float] = None, alpha: Optional[float] = None, min_periods: Optional[int] = None, ignore_na: bool = False) → ExponentialMovingGroupby[FrameLike]

Return an ewm grouper, providing ewm functionality per group.

Note

‘min_periods’ in pandas-on-Spark works as a fixed window size unlike pandas. Unlike pandas, NA is also counted as the period. This might be changed in the near future.

Parameters
comfloat, optional

Specify decay in terms of center of mass. alpha = 1 / (1 + com), for com >= 0.

spanfloat, optional

Specify decay in terms of span. alpha = 2 / (span + 1), for span >= 1.

halflifefloat, optional

Specify decay in terms of half-life. alpha = 1 - exp(-ln(2) / halflife), for halflife > 0.

alphafloat, optional

Specify smoothing factor alpha directly. 0 < alpha <= 1.

min_periodsint, default None

Minimum number of observations in window required to have a value (otherwise result is NA).

ignore_nabool, default False

Ignore missing values when calculating weights.

  • When ignore_na=False (default), weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-lpha)^2\) and \(1\) if adjust=True, and \((1-lpha)^2\) and \(lpha\) if adjust=False.

  • When ignore_na=True, weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-lpha\) and \(1\) if adjust=True, and \(1-lpha\) and \(lpha\) if adjust=False.