pyspark.RDD.pipe¶

RDD.pipe(command: str, env: Optional[Dict[str, str]] = None, checkCode: bool = False) → pyspark.rdd.RDD[str]¶

Return an RDD created by piping elements to a forked external process.

Parameters

commandstr: command to run.
envdict, optional: environment variables to set.
checkCodebool, optional: whether or not to check the return value of the shell command.

Examples

>>> sc.parallelize(['1', '2', '', '3']).pipe('cat').collect()
['1', '2', '', '3']

pyspark.RDD.persist

pyspark.RDD.randomSplit