pyspark.SparkContext.newAPIHadoopRDD¶
- 
SparkContext.newAPIHadoopRDD(inputFormatClass: str, keyClass: str, valueClass: str, keyConverter: Optional[str] = None, valueConverter: Optional[str] = None, conf: Optional[Dict[str, str]] = None, batchSize: int = 0) → pyspark.rdd.RDD[Tuple[T, U]]¶ Read a ‘new API’ Hadoop InputFormat with arbitrary key and value class, from an arbitrary Hadoop configuration, which is passed in as a Python dict. This will be converted into a Configuration in Java. The mechanism is the same as for
SparkContext.sequenceFile().- Parameters
 - inputFormatClassstr
 fully qualified classname of Hadoop InputFormat (e.g. “org.apache.hadoop.mapreduce.lib.input.TextInputFormat”)
- keyClassstr
 fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)
- valueClassstr
 fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”)
- keyConverterstr, optional
 fully qualified name of a function returning key WritableConverter (None by default)
- valueConverterstr, optional
 fully qualified name of a function returning value WritableConverter (None by default)
- confdict, optional
 Hadoop configuration, passed in as a dict (None by default)
- batchSizeint, optional
 The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)