RankingMetrics

class pyspark.mllib.evaluation.RankingMetrics(predictionAndLabels: Union[pyspark.rdd.RDD[Tuple[List[T], List[T]]], pyspark.rdd.RDD[Tuple[List[T], List[T], List[float]]]])

Evaluator for ranking algorithms.

Parameters
predictionAndLabelspyspark.RDD

an RDD of (predicted ranking, ground truth set) pairs or (predicted ranking, ground truth set, relevance value of ground truth set). Since 3.4.0, it supports ndcg evaluation with relevance value.

Examples

>>> predictionAndLabels = sc.parallelize([
...     ([1, 6, 2, 7, 8, 3, 9, 10, 4, 5], [1, 2, 3, 4, 5]),
...     ([4, 1, 5, 6, 2, 7, 3, 8, 9, 10], [1, 2, 3]),
...     ([1, 2, 3, 4, 5], [])])
>>> metrics = RankingMetrics(predictionAndLabels)
>>> metrics.precisionAt(1)
0.33...
>>> metrics.precisionAt(5)
0.26...
>>> metrics.precisionAt(15)
0.17...
>>> metrics.meanAveragePrecision
0.35...
>>> metrics.meanAveragePrecisionAt(1)
0.3333333333333333...
>>> metrics.meanAveragePrecisionAt(2)
0.25...
>>> metrics.ndcgAt(3)
0.33...
>>> metrics.ndcgAt(10)
0.48...
>>> metrics.recallAt(1)
0.06...
>>> metrics.recallAt(5)
0.35...
>>> metrics.recallAt(15)
0.66...

Methods

call(name, *a)

Call method of java_model

meanAveragePrecisionAt(k)

Returns the mean average precision (MAP) at first k ranking of all the queries.

ndcgAt(k)

Compute the average NDCG value of all the queries, truncated at ranking position k.

precisionAt(k)

Compute the average precision of all the queries, truncated at ranking position k.

recallAt(k)

Compute the average recall of all the queries, truncated at ranking position k.

Attributes

meanAveragePrecision

Returns the mean average precision (MAP) of all the queries.

Methods Documentation

call(name: str, *a: Any) → Any

Call method of java_model

meanAveragePrecisionAt(k: int) → float

Returns the mean average precision (MAP) at first k ranking of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.

ndcgAt(k: int) → float

Compute the average NDCG value of all the queries, truncated at ranking position k. The discounted cumulative gain at position k is computed as: sum,,i=1,,^k^ (2^{relevance of ‘’i’’th item}^ - 1) / log(i + 1), and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current implementation, the relevance value is binary. If a query has an empty ground truth set, zero will be used as NDCG together with a log warning.

precisionAt(k: int) → float

Compute the average precision of all the queries, truncated at ranking position k.

If for a query, the ranking algorithm returns n (n < k) results, the precision value will be computed as #(relevant items retrieved) / k. This formula also applies when the size of the ground truth set is less than k.

If a query has an empty ground truth set, zero will be used as precision together with a log warning.

recallAt(k: int) → float

Compute the average recall of all the queries, truncated at ranking position k.

If for a query, the ranking algorithm returns n results, the recall value will be computed as #(relevant items retrieved) / #(ground truth set). This formula also applies when the size of the ground truth set is less than k.

If a query has an empty ground truth set, zero will be used as recall together with a log warning.

Attributes Documentation

meanAveragePrecision

Returns the mean average precision (MAP) of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.