pyspark.streaming.StreamingContext.binaryRecordsStream

StreamingContext.binaryRecordsStream(directory: str, recordLength: int) → pyspark.streaming.dstream.DStream[bytes]

Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length. Files must be written to the monitored directory by “moving” them from another location within the same file system. File names starting with . are ignored.

Parameters
directorystr

Directory to load data from

recordLengthint

Length of each record in bytes