pyspark.sql.DataFrameWriter.partitionBy¶

DataFrameWriter.partitionBy(*cols: Union[str, List[str]]) → pyspark.sql.readwriter.DataFrameWriter¶

Partitions the output by the given columns on the file system.

If specified, the output is laid out on the file system similar to Hive’s partitioning scheme.

Parameters

Examples

>>> df.write.partitionBy('year', 'month').parquet(os.path.join(tempfile.mkdtemp(), 'data'))

pyspark.sql.DataFrameWriter.parquet

pyspark.sql.DataFrameWriter.save