pyspark.sql.DataFrameWriter.partitionBy

DataFrameWriter.partitionBy(*cols: Union[str, List[str]]) → pyspark.sql.readwriter.DataFrameWriter

Partitions the output by the given columns on the file system.

If specified, the output is laid out on the file system similar to Hive’s partitioning scheme.

Parameters
colsstr or list

name of columns

Examples

>>> df.write.partitionBy('year', 'month').parquet(os.path.join(tempfile.mkdtemp(), 'data'))