pyspark.pandas.Series.autocorr¶
-
Series.
autocorr
(periods: int = 1) → float¶ Compute the lag-N autocorrelation.
This method computes the Pearson correlation between the Series and its shifted self.
Note
the current implementation of rank uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.
- Parameters
- periodsint, default 1
Number of lags to apply before performing autocorrelation.
- Returns
- float
The Pearson correlation between self and self.shift(lag).
See also
Series.corr
Compute the correlation between two Series.
Series.shift
Shift index by desired number of periods.
DataFrame.corr
Compute pairwise correlation of columns.
Notes
If the Pearson correlation is not well defined return ‘NaN’.
Examples
>>> s = ps.Series([.2, .0, .6, .2, np.nan, .5, .6]) >>> s.autocorr() -0.141219... >>> s.autocorr(0) 1.0... >>> s.autocorr(2) 0.970725... >>> s.autocorr(-3) 0.277350... >>> s.autocorr(5) -1.000000... >>> s.autocorr(6) nan
If the Pearson correlation is not well defined, then ‘NaN’ is returned.
>>> s = ps.Series([1, 0, 0, 0]) >>> s.autocorr() nan