pyspark.sql.DataFrame.inputFiles¶

DataFrame.inputFiles() → List[str]¶

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

Examples

>>> df = spark.read.load("examples/src/main/resources/people.json", format="json")
>>> if os.environ.get('PYTEST_DBCONNECT_MODE') is None:
...     len(df.inputFiles())
... else:
...     1 # dbconnect doesn't support inputFiles therefore hack to skip the check here
1

pyspark.sql.DataFrame.hint

pyspark.sql.DataFrame.intersect