pyspark.streaming.DStream.updateStateByKey

DStream.updateStateByKey(updateFunc: Callable[[Iterable[V], Optional[S]], S], numPartitions: Optional[int] = None, initialRDD: Union[pyspark.rdd.RDD[Tuple[K, S]], Iterable[Tuple[K, S]], None] = None) → pyspark.streaming.dstream.DStream[Tuple[K, S]]

Return a new “state” DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key.

Parameters
updateFuncfunction

State update function. If this function returns None, then corresponding state key-value pair will be eliminated.