pyspark.sql.streaming.DataStreamWriter.start¶
-
DataStreamWriter.
start
(path: Optional[str] = None, format: Optional[str] = None, outputMode: Optional[str] = None, partitionBy: Union[str, List[str], None] = None, queryName: Optional[str] = None, **options: OptionalPrimitiveType) → pyspark.sql.streaming.query.StreamingQuery¶ Streams the contents of the
DataFrame
to a data source.The data source is specified by the
format
and a set ofoptions
. Ifformat
is not specified, the default data source configured byspark.sql.sources.default
will be used.- Parameters
- pathstr, optional
the path in a Hadoop supported file system
- formatstr, optional
the format used to save
- outputModestr, optional
specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
append: Only the new rows in the streaming DataFrame/Dataset will be written to the sink
complete: All the rows in the streaming DataFrame/Dataset will be written to the sink every time these are some updates
update: only the rows that were updated in the streaming DataFrame/Dataset will be written to the sink every time there are some updates. If the query doesn’t contain aggregations, it will be equivalent to append mode.
- partitionBystr or list, optional
names of partitioning columns
- queryNamestr, optional
unique name for the query
- **optionsdict
All other string options. You may want to provide a checkpointLocation for most streams, however it is not required for a memory stream.
Notes
This API is evolving.
Examples
>>> sq = sdf.writeStream.format('memory').queryName('this_query').start() >>> sq.isActive True >>> sq.name 'this_query' >>> sq.stop() >>> sq.isActive False >>> sq = sdf.writeStream.trigger(processingTime='5 seconds').start( ... queryName='that_query', outputMode="append", format='memory') >>> sq.name 'that_query' >>> sq.isActive True >>> sq.stop()