pyspark.pandas.DataFrame.to_excel¶

DataFrame.to_excel(excel_writer: Union[str, pandas.io.excel._base.ExcelWriter], sheet_name: str = 'Sheet1', na_rep: str = '', float_format: Optional[str] = None, columns: Union[str, List[str], None] = None, header: bool = True, index: bool = True, index_label: Union[str, List[str], None] = None, startrow: int = 0, startcol: int = 0, engine: Optional[str] = None, merge_cells: bool = True, encoding: Optional[str] = None, inf_rep: str = 'inf', verbose: bool = True, freeze_panes: Optional[Tuple[int, int]] = None) → None¶

Write object to an Excel sheet.

Note

This method should only be used if the resulting DataFrame is expected to be small, as all the data is loaded into the driver’s memory.

To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to.

Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased.

Parameters

excel_writerstr or ExcelWriter object: File path or existing ExcelWriter.
sheet_namestr, default ‘Sheet1’: Name of sheet which will contain DataFrame.
na_repstr, default ‘’: Missing data representation.
float_formatstr, optional: Format string for floating point numbers. For example float_format="%%.2f" will format 0.1234 to 0.12.
columnssequence or list of str, optional: Columns to write.
headerbool or list of str, default True: Write out the column names. If a list of string is given it is assumed to be aliases for the column names.
indexbool, default True: Write row names (index).
index_labelstr or sequence, optional: Column label for index column(s) if desired. If not specified, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
startrowint, default 0: Upper left cell row to dump data frame.
startcolint, default 0: Upper left cell column to dump data frame.
enginestr, optional: Write engine to use, ‘openpyxl’ or ‘xlsxwriter’. You can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
merge_cellsbool, default True: Write MultiIndex and Hierarchical Rows as merged cells.
encodingstr, optional: Encoding of the resulting excel file. Only necessary for xlwt, other writers support unicode natively.
inf_repstr, default ‘inf’: Representation for infinity (there is no native representation for infinity in Excel).
verbosebool, default True: Display more information in the error logs.
freeze_panestuple of int (length 2), optional: Specifies the one-based bottommost row and rightmost column that is to be frozen.

See also

read_excel: Read Excel file.

Notes

Once a workbook has been saved it is not possible write further data without rewriting the whole workbook.

Examples

Create, write to and save a workbook:

>>> df1 = ps.DataFrame([['a', 'b'], ['c', 'd']],
...                    index=['row 1', 'row 2'],
...                    columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx")  

To specify the sheet name:

>>> df1.to_excel("output.xlsx")  
>>> df1.to_excel("output.xlsx",
...              sheet_name='Sheet_name_1')  

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> with pd.ExcelWriter('output.xlsx') as writer:  
...      df1.to_excel(writer, sheet_name='Sheet_name_1')
...      df2.to_excel(writer, sheet_name='Sheet_name_2')

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel('output1.xlsx', engine='xlsxwriter')  

pyspark.pandas.read_excel

pyspark.pandas.read_json