pyspark.pandas.Series.corr¶

Series.corr(other: pyspark.pandas.series.Series, method: str = 'pearson') → float¶

Compute correlation with other Series, excluding missing values.

Parameters

otherSeries

method{‘pearson’, ‘spearman’}

pearson : standard correlation coefficient
spearman : Spearman rank correlation

Returns

correlationfloat

Notes

There are behavior differences between pandas-on-Spark and pandas.

the method argument only accepts ‘pearson’, ‘spearman’
the data should not contain NaNs. pandas-on-Spark will return an error.
pandas-on-Spark doesn’t support the following argument(s).
- min_periods argument is not supported

Examples

>>> df = ps.DataFrame({'s1': [.2, .0, .6, .2],
...                    's2': [.3, .6, .0, .1]})
>>> s1 = df.s1
>>> s2 = df.s2
>>> s1.corr(s2, method='pearson')  
-0.851064...

>>> s1.corr(s2, method='spearman')  
-0.948683...

pyspark.pandas.Series.clip

pyspark.pandas.Series.count