pyspark.SparkContext.addFile¶
-
SparkContext.
addFile
(path: str, recursive: bool = False) → None¶ Add a file to be downloaded with this Spark job on every node. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
To access the file in Spark jobs, use
SparkFiles.get()
with the filename to find its download location.A directory can be given if the recursive option is set to True. Currently directories are only supported for Hadoop-supported filesystems.
Notes
A path can be added only once. Subsequent additions of the same path are ignored.
Examples
>>> from pyspark import SparkFiles >>> path = os.path.join(tempdir, "test.txt") >>> with open(path, "w") as testFile: ... _ = testFile.write("100") >>> sc.addFile(path) >>> def func(iterator): ... with open(SparkFiles.get("test.txt")) as testFile: ... fileVal = int(testFile.readline()) ... return [x * fileVal for x in iterator] >>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect() [100, 200, 300, 400]