pyspark.RDD.saveAsTextFile¶
-
RDD.
saveAsTextFile
(path: str, compressionCodecClass: Optional[str] = None) → None¶ Save this RDD as a text file, using string representations of elements.
- Parameters
- pathstr
path to text file
- compressionCodecClassstr, optional
fully qualified classname of the compression codec class i.e. “org.apache.hadoop.io.compress.GzipCodec” (None by default)
Examples
>>> from tempfile import NamedTemporaryFile >>> tempFile = NamedTemporaryFile(delete=True) >>> tempFile.close() >>> sc.parallelize(range(10)).saveAsTextFile(tempFile.name) >>> from fileinput import input >>> from glob import glob >>> ''.join(sorted(input(glob(tempFile.name + "/part-0000*")))) '0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n'
Empty lines are tolerated when saving to text files.
>>> from tempfile import NamedTemporaryFile >>> tempFile2 = NamedTemporaryFile(delete=True) >>> tempFile2.close() >>> sc.parallelize(['', 'foo', '', 'bar', '']).saveAsTextFile(tempFile2.name) >>> ''.join(sorted(input(glob(tempFile2.name + "/part-0000*")))) '\n\n\nbar\nfoo\n'
Using compressionCodecClass
>>> from tempfile import NamedTemporaryFile >>> tempFile3 = NamedTemporaryFile(delete=True) >>> tempFile3.close() >>> codec = "org.apache.hadoop.io.compress.GzipCodec" >>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec) >>> from fileinput import input, hook_compressed >>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed)) >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result]) 'bar\nfoo\n'