pyspark.RDD.coalesce¶

RDD.coalesce(numPartitions: int, shuffle: bool = False) → pyspark.rdd.RDD[T]¶

Return a new RDD that is reduced into numPartitions partitions.

Examples

>>> sc.parallelize([1, 2, 3, 4, 5], 3).glom().collect()
[[1], [2, 3], [4, 5]]
>>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect()
[[1, 2, 3, 4, 5]]

pyspark.RDD.cleanShuffleDependencies

pyspark.RDD.cogroup