pyspark.RDD.sortBy¶
-
RDD.
sortBy
(keyfunc: Callable[[T], S], ascending: bool = True, numPartitions: Optional[int] = None) → RDD[T]¶ Sorts this RDD by the given keyfunc
Examples
>>> tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)] >>> sc.parallelize(tmp).sortBy(lambda x: x[0]).collect() [('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)] >>> sc.parallelize(tmp).sortBy(lambda x: x[1]).collect() [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]