pyspark.sql.streaming.DataStreamReader¶

class pyspark.sql.streaming.DataStreamReader(spark: SparkSession)¶

Interface used to load a streaming DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.readStream to access this.

Notes

This API is evolving.

Methods

`csv`(path[, schema, sep, encoding, quote, …])	Loads a CSV file stream and returns the result as a `DataFrame`.
`format`(source)	Specifies the input data source format.
`json`(path[, schema, primitivesAsString, …])	Loads a JSON file stream and returns the results as a `DataFrame`.
`load`([path, format, schema])	Loads a data stream from a data source and returns it as a `DataFrame`.
`option`(key, value)	Adds an input option for the underlying data source.
`options`(**options)	Adds input options for the underlying data source.
`orc`(path[, mergeSchema, pathGlobFilter, …])	Loads a ORC file stream, returning the result as a `DataFrame`.
`parquet`(path[, mergeSchema, pathGlobFilter, …])	Loads a Parquet file stream, returning the result as a `DataFrame`.
`schema`(schema)	Specifies the input schema.
`table`(tableName)	Define a Streaming DataFrame on a Table.
`text`(path[, wholetext, lineSep, …])	Loads a text file stream and returns a `DataFrame` whose schema starts with a string column named “value”, and followed by partitioned columns if there are any.

Core Classes

pyspark.sql.streaming.DataStreamWriter