RankingMetrics¶

class pyspark.mllib.evaluation.RankingMetrics(predictionAndLabels: Union[pyspark.rdd.RDD[Tuple[List[T], List[T]]], pyspark.rdd.RDD[Tuple[List[T], List[T], List[float]]]])¶

Evaluator for ranking algorithms.

Parameters

predictionAndLabelspyspark.RDD: an RDD of (predicted ranking, ground truth set) pairs or (predicted ranking, ground truth set, relevance value of ground truth set). Since 3.4.0, it supports ndcg evaluation with relevance value.

Examples

>>> predictionAndLabels = sc.parallelize([
...     ([1, 6, 2, 7, 8, 3, 9, 10, 4, 5], [1, 2, 3, 4, 5]),
...     ([4, 1, 5, 6, 2, 7, 3, 8, 9, 10], [1, 2, 3]),
...     ([1, 2, 3, 4, 5], [])])
>>> metrics = RankingMetrics(predictionAndLabels)
>>> metrics.precisionAt(1)
0.33...
>>> metrics.precisionAt(5)
0.26...
>>> metrics.precisionAt(15)
0.17...
>>> metrics.meanAveragePrecision
0.35...
>>> metrics.meanAveragePrecisionAt(1)
0.3333333333333333...
>>> metrics.meanAveragePrecisionAt(2)
0.25...
>>> metrics.ndcgAt(3)
0.33...
>>> metrics.ndcgAt(10)
0.48...
>>> metrics.recallAt(1)
0.06...
>>> metrics.recallAt(5)
0.35...
>>> metrics.recallAt(15)
0.66...

Methods

`call`(name, *a)	Call method of java_model
`meanAveragePrecisionAt`(k)	Returns the mean average precision (MAP) at first k ranking of all the queries.
`ndcgAt`(k)	Compute the average NDCG value of all the queries, truncated at ranking position k.
`precisionAt`(k)	Compute the average precision of all the queries, truncated at ranking position k.
`recallAt`(k)	Compute the average recall of all the queries, truncated at ranking position k.

Attributes

meanAveragePrecision

Returns the mean average precision (MAP) of all the queries.

Methods Documentation

call(name: str, *a: Any) → Any¶: Call method of java_model

meanAveragePrecisionAt(k: int) → float¶: Returns the mean average precision (MAP) at first k ranking of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.

ndcgAt(k: int) → float¶: Compute the average NDCG value of all the queries, truncated at ranking position k. The discounted cumulative gain at position k is computed as: sum,,i=1,,^k^ (2^{relevance of ‘’i’’th item}^ - 1) / log(i + 1), and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current implementation, the relevance value is binary. If a query has an empty ground truth set, zero will be used as NDCG together with a log warning.

precisionAt(k: int) → float¶

Compute the average precision of all the queries, truncated at ranking position k.

If for a query, the ranking algorithm returns n (n < k) results, the precision value will be computed as #(relevant items retrieved) / k. This formula also applies when the size of the ground truth set is less than k.

If a query has an empty ground truth set, zero will be used as precision together with a log warning.

recallAt(k: int) → float¶

Compute the average recall of all the queries, truncated at ranking position k.

If for a query, the ranking algorithm returns n results, the recall value will be computed as #(relevant items retrieved) / #(ground truth set). This formula also applies when the size of the ground truth set is less than k.

If a query has an empty ground truth set, zero will be used as recall together with a log warning.

Attributes Documentation

meanAveragePrecision¶: Returns the mean average precision (MAP) of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.

MulticlassMetrics

Normalizer