pyspark.SparkContext.textFile¶
-
SparkContext.
textFile
(name: str, minPartitions: Optional[int] = None, use_unicode: bool = True) → pyspark.rdd.RDD[str]¶ Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings. The text files must be encoded as UTF-8.
If use_unicode is False, the strings will be kept as str (encoding as utf-8), which is faster and smaller than unicode. (Added in Spark 1.2)
Examples
>>> path = os.path.join(tempdir, "sample-text.txt") >>> with open(path, "w") as testFile: ... _ = testFile.write("Hello world!") >>> textFile = sc.textFile(path) >>> textFile.collect() ['Hello world!']