pyspark.RDD.meanApprox¶
-
RDD.
meanApprox
(timeout: int, confidence: float = 0.95) → pyspark.rdd.BoundedFloat¶ Approximate operation to return the mean within a timeout or meet the confidence.
Examples
>>> rdd = sc.parallelize(range(1000), 10) >>> r = sum(range(1000)) / 1000.0 >>> abs(rdd.meanApprox(1000) - r) / r < 0.05 True