pyspark.pandas.DataFrame.corrwith

DataFrame.corrwith(other: Union[DataFrame, Series], drop: bool = False, method: str = 'pearson') → Series

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

Parameters
otherDataFrame, Series

Object with which to compute correlations.

dropbool, default False

Drop missing indices from result.

methodstr, default ‘pearson’

Method of correlation, one of:

  • pearson : standard correlation coefficient

Returns
Series

Pairwise correlations.

See also

DataFrame.corr

Compute pairwise correlation of columns.

Examples

>>> df1 = ps.DataFrame({
...         "A":[1, 5, 7, 8],
...         "X":[5, 8, 4, 3],
...         "C":[10, 4, 9, 3]})
>>> df1.corrwith(df1[["X", "C"]])
X    1.0
C    1.0
A    NaN
dtype: float64
>>> df2 = ps.DataFrame({
...         "A":[5, 3, 6, 4],
...         "B":[11, 2, 4, 3],
...         "C":[4, 3, 8, 5]})
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2)
A   -0.041703
C    0.395437
X         NaN
B         NaN
dtype: float64
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df2.corrwith(df1.X)
A   -0.597614
B   -0.151186
C   -0.642857
dtype: float64