pyspark.AccumulatorParam¶
-
class
pyspark.
AccumulatorParam
¶ Helper object that defines how to accumulate values of a given type.
Examples
>>> from pyspark.accumulators import AccumulatorParam >>> class VectorAccumulatorParam(AccumulatorParam): ... def zero(self, value): ... return [0.0] * len(value) ... def addInPlace(self, val1, val2): ... for i in range(len(val1)): ... val1[i] += val2[i] ... return val1 >>> va = sc.accumulator([1.0, 2.0, 3.0], VectorAccumulatorParam()) >>> va.value [1.0, 2.0, 3.0] >>> def g(x): ... global va ... va += [x] * 3 >>> rdd = sc.parallelize([1,2,3]) >>> rdd.foreach(g) >>> va.value [7.0, 8.0, 9.0]
Methods
addInPlace
(value1, value2)Add two values of the accumulator’s data type, returning a new value; for efficiency, can also update value1 in place and return it.
zero
(value)Provide a “zero value” for the type, compatible in dimensions with the provided value (e.g., a zero vector)