pyspark.RDD.saveAsPickleFile¶
-
RDD.
saveAsPickleFile
(path: str, batchSize: int = 10) → None¶ Save this RDD as a SequenceFile of serialized objects. The serializer used is
pyspark.serializers.CPickleSerializer
, default batch size is 10.Examples
>>> from tempfile import NamedTemporaryFile >>> tmpFile = NamedTemporaryFile(delete=True) >>> tmpFile.close() >>> sc.parallelize([1, 2, 'spark', 'rdd']).saveAsPickleFile(tmpFile.name, 3) >>> sorted(sc.pickleFile(tmpFile.name, 5).map(str).collect()) ['1', '2', 'rdd', 'spark']