pyspark.sql.streaming.DataStreamReader

class pyspark.sql.streaming.DataStreamReader(spark: SparkSession)

Interface used to load a streaming DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.readStream to access this.

Notes

This API is evolving.

Methods

csv(path[, schema, sep, encoding, quote, …])

Loads a CSV file stream and returns the result as a DataFrame.

format(source)

Specifies the input data source format.

json(path[, schema, primitivesAsString, …])

Loads a JSON file stream and returns the results as a DataFrame.

load([path, format, schema])

Loads a data stream from a data source and returns it as a DataFrame.

option(key, value)

Adds an input option for the underlying data source.

options(**options)

Adds input options for the underlying data source.

orc(path[, mergeSchema, pathGlobFilter, …])

Loads a ORC file stream, returning the result as a DataFrame.

parquet(path[, mergeSchema, pathGlobFilter, …])

Loads a Parquet file stream, returning the result as a DataFrame.

schema(schema)

Specifies the input schema.

table(tableName)

Define a Streaming DataFrame on a Table.

text(path[, wholetext, lineSep, …])

Loads a text file stream and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any.