pyspark.sql.functions.corr

pyspark.sql.functions.corr(col1: ColumnOrName, col2: ColumnOrName) → pyspark.sql.column.Column

Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.

Examples

>>> a = range(20)
>>> b = [2 * x for x in range(20)]
>>> df = spark.createDataFrame(zip(a, b), ["a", "b"])
>>> df.agg(corr("a", "b").alias('c')).collect()
[Row(c=1.0)]