pyspark.sql.DataFrameWriter.partitionBy¶
-
DataFrameWriter.
partitionBy
(*cols: Union[str, List[str]]) → pyspark.sql.readwriter.DataFrameWriter¶ Partitions the output by the given columns on the file system.
If specified, the output is laid out on the file system similar to Hive’s partitioning scheme.
- Parameters
- colsstr or list
name of columns
Examples
>>> df.write.partitionBy('year', 'month').parquet(os.path.join(tempfile.mkdtemp(), 'data'))