pyspark.RDD.glom

RDD.glom() → pyspark.rdd.RDD[List[T]]

Return an RDD created by coalescing all elements within each partition into a list.

Examples

>>> rdd = sc.parallelize([1, 2, 3, 4], 2)
>>> sorted(rdd.glom().collect())
[[1, 2], [3, 4]]