RankingMetrics¶
-
class
pyspark.mllib.evaluation.
RankingMetrics
(predictionAndLabels: Union[pyspark.rdd.RDD[Tuple[List[T], List[T]]], pyspark.rdd.RDD[Tuple[List[T], List[T], List[float]]]])¶ Evaluator for ranking algorithms.
- Parameters
- predictionAndLabels
pyspark.RDD
an RDD of (predicted ranking, ground truth set) pairs or (predicted ranking, ground truth set, relevance value of ground truth set). Since 3.4.0, it supports ndcg evaluation with relevance value.
- predictionAndLabels
Examples
>>> predictionAndLabels = sc.parallelize([ ... ([1, 6, 2, 7, 8, 3, 9, 10, 4, 5], [1, 2, 3, 4, 5]), ... ([4, 1, 5, 6, 2, 7, 3, 8, 9, 10], [1, 2, 3]), ... ([1, 2, 3, 4, 5], [])]) >>> metrics = RankingMetrics(predictionAndLabels) >>> metrics.precisionAt(1) 0.33... >>> metrics.precisionAt(5) 0.26... >>> metrics.precisionAt(15) 0.17... >>> metrics.meanAveragePrecision 0.35... >>> metrics.meanAveragePrecisionAt(1) 0.3333333333333333... >>> metrics.meanAveragePrecisionAt(2) 0.25... >>> metrics.ndcgAt(3) 0.33... >>> metrics.ndcgAt(10) 0.48... >>> metrics.recallAt(1) 0.06... >>> metrics.recallAt(5) 0.35... >>> metrics.recallAt(15) 0.66...
Methods
call
(name, *a)Call method of java_model
Returns the mean average precision (MAP) at first k ranking of all the queries.
ndcgAt
(k)Compute the average NDCG value of all the queries, truncated at ranking position k.
precisionAt
(k)Compute the average precision of all the queries, truncated at ranking position k.
recallAt
(k)Compute the average recall of all the queries, truncated at ranking position k.
Attributes
Returns the mean average precision (MAP) of all the queries.
Methods Documentation
-
call
(name: str, *a: Any) → Any¶ Call method of java_model
-
meanAveragePrecisionAt
(k: int) → float¶ Returns the mean average precision (MAP) at first k ranking of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.
-
ndcgAt
(k: int) → float¶ Compute the average NDCG value of all the queries, truncated at ranking position k. The discounted cumulative gain at position k is computed as: sum,,i=1,,^k^ (2^{relevance of ‘’i’’th item}^ - 1) / log(i + 1), and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current implementation, the relevance value is binary. If a query has an empty ground truth set, zero will be used as NDCG together with a log warning.
-
precisionAt
(k: int) → float¶ Compute the average precision of all the queries, truncated at ranking position k.
If for a query, the ranking algorithm returns n (n < k) results, the precision value will be computed as #(relevant items retrieved) / k. This formula also applies when the size of the ground truth set is less than k.
If a query has an empty ground truth set, zero will be used as precision together with a log warning.
-
recallAt
(k: int) → float¶ Compute the average recall of all the queries, truncated at ranking position k.
If for a query, the ranking algorithm returns n results, the recall value will be computed as #(relevant items retrieved) / #(ground truth set). This formula also applies when the size of the ground truth set is less than k.
If a query has an empty ground truth set, zero will be used as recall together with a log warning.
Attributes Documentation
-
meanAveragePrecision
¶ Returns the mean average precision (MAP) of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.