pyspark.sql.DataFrameWriter

class pyspark.sql.DataFrameWriter(df: DataFrame)

Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.write to access this.

Methods

bucketBy(numBuckets, col, *cols)

Buckets the output by the given columns.

csv(path[, mode, compression, sep, quote, …])

Saves the content of the DataFrame in CSV format at the specified path.

format(source)

Specifies the underlying output data source.

insertInto(tableName[, overwrite])

Inserts the content of the DataFrame to the specified table.

jdbc(url, table[, mode, properties])

Saves the content of the DataFrame to an external database table via JDBC.

json(path[, mode, compression, dateFormat, …])

Saves the content of the DataFrame in JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.

mode(saveMode)

Specifies the behavior when data or table already exists.

option(key, value)

Adds an output option for the underlying data source.

options(**options)

Adds output options for the underlying data source.

orc(path[, mode, partitionBy, compression])

Saves the content of the DataFrame in ORC format at the specified path.

parquet(path[, mode, partitionBy, compression])

Saves the content of the DataFrame in Parquet format at the specified path.

partitionBy(*cols)

Partitions the output by the given columns on the file system.

save([path, format, mode, partitionBy])

Saves the contents of the DataFrame to a data source.

saveAsTable(name[, format, mode, partitionBy])

Saves the content of the DataFrame as the specified table.

sortBy(col, *cols)

Sorts the output in each bucket by the given columns on the file system.

text(path[, compression, lineSep])

Saves the content of the DataFrame in a text file at the specified path.