pyspark.RDD.mapValues¶
-
RDD.
mapValues
(f: Callable[[V], U]) → pyspark.rdd.RDD[Tuple[K, U]]¶ Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD’s partitioning.
Examples
>>> x = sc.parallelize([("a", ["apple", "banana", "lemon"]), ("b", ["grapes"])]) >>> def f(x): return len(x) >>> x.mapValues(f).collect() [('a', 3), ('b', 1)]