pyspark.SparkContext.runJob¶
-
SparkContext.
runJob
(rdd: pyspark.rdd.RDD[T], partitionFunc: Callable[[Iterable[T]], Iterable[U]], partitions: Optional[Sequence[int]] = None, allowLocal: bool = False) → List[U]¶ Executes the given partitionFunc on the specified set of partitions, returning the result as an array of elements.
If ‘partitions’ is not specified, this will run over all partitions.
Examples
>>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part]) [0, 1, 4, 9, 16, 25]
>>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part], [0, 2], True) [0, 1, 16, 25]