pyspark.StorageLevel

class pyspark.StorageLevel(useDisk: bool, useMemory: bool, useOffHeap: bool, deserialized: bool, replication: int = 1)

Flags for controlling the storage of an RDD. Each StorageLevel records whether to use memory, whether to drop the RDD to disk if it falls out of memory, whether to keep the data in memory in a JAVA-specific serialized format, and whether to replicate the RDD partitions on multiple nodes. Also contains static constants for some commonly used storage levels, MEMORY_ONLY. Since the data is always serialized on the Python side, all the constants use the serialized formats.

Attributes

DISK_ONLY

DISK_ONLY_2

DISK_ONLY_3

MEMORY_AND_DISK

MEMORY_AND_DISK_2

MEMORY_AND_DISK_DESER

MEMORY_ONLY

MEMORY_ONLY_2

OFF_HEAP