pyspark.SparkContext.union¶
-
SparkContext.
union
(rdds: List[pyspark.rdd.RDD[T]]) → pyspark.rdd.RDD[T]¶ Build the union of a list of RDDs.
This supports unions() of RDDs with different serialized formats, although this forces them to be reserialized using the default serializer:
Examples
>>> path = os.path.join(tempdir, "union-text.txt") >>> with open(path, "w") as testFile: ... _ = testFile.write("Hello") >>> textFile = sc.textFile(path) >>> textFile.collect() ['Hello'] >>> parallelized = sc.parallelize(["World!"]) >>> sorted(sc.union([textFile, parallelized]).collect()) ['Hello', 'World!']