pyspark.sql.DataFrameWriter.sortBy¶
-
DataFrameWriter.
sortBy
(col: Union[str, List[str], Tuple[str, …]], *cols: Optional[str]) → pyspark.sql.readwriter.DataFrameWriter¶ Sorts the output in each bucket by the given columns on the file system.
- Parameters
- colstr, tuple or list
a name of a column, or a list of names.
- colsstr
additional names (optional). If col is a list it should be empty.
Examples
>>> (df.write.format('parquet') ... .bucketBy(100, 'year', 'month') ... .sortBy('day') ... .mode("overwrite") ... .saveAsTable('sorted_bucketed_table'))