pyspark.RDD.saveAsNewAPIHadoopDataset¶

RDD.saveAsNewAPIHadoopDataset(conf: Dict[str, str], keyConverter: Optional[str] = None, valueConverter: Optional[str] = None) → None¶

Output a Python RDD of key-value pairs (of form RDD[(K, V)]) to any Hadoop file system, using the new Hadoop OutputFormat API (mapreduce package). Keys/values are converted for output using either user specified converters or, by default, “org.apache.spark.api.python.JavaToWritableConverter”.

Parameters

confdict: Hadoop job configuration
keyConverterstr, optional: fully qualified classname of key converter (None by default)
valueConverterstr, optional: fully qualified classname of value converter (None by default)

pyspark.RDD.saveAsHadoopFile

pyspark.RDD.saveAsNewAPIHadoopFile