pyspark.RDD.fullOuterJoin¶

RDD.fullOuterJoin(other: pyspark.rdd.RDD[Tuple[K, U]], numPartitions: Optional[int] = None) → pyspark.rdd.RDD[Tuple[K, Tuple[Optional[V], Optional[U]]]]¶

Perform a right outer join of self and other.

For each element (k, v) in self, the resulting RDD will either contain all pairs (k, (v, w)) for w in other, or the pair (k, (v, None)) if no elements in other have key k.

Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (v, w)) for v in self, or the pair (k, (None, w)) if no elements in self have key k.

Hash-partitions the resulting RDD into the given number of partitions.

Examples

>>> x = sc.parallelize([("a", 1), ("b", 4)])
>>> y = sc.parallelize([("a", 2), ("c", 8)])
>>> sorted(x.fullOuterJoin(y).collect())
[('a', (1, 2)), ('b', (4, None)), ('c', (None, 8))]

pyspark.RDD.foreachPartition

pyspark.RDD.getCheckpointFile