pyspark.sql.DataFrameReader.json¶
-
DataFrameReader.
json
(path: Union[str, List[str], pyspark.rdd.RDD[str]], schema: Union[pyspark.sql.types.StructType, str, None] = None, primitivesAsString: Union[bool, str, None] = None, prefersDecimal: Union[bool, str, None] = None, allowComments: Union[bool, str, None] = None, allowUnquotedFieldNames: Union[bool, str, None] = None, allowSingleQuotes: Union[bool, str, None] = None, allowNumericLeadingZero: Union[bool, str, None] = None, allowBackslashEscapingAnyCharacter: Union[bool, str, None] = None, mode: Optional[str] = None, columnNameOfCorruptRecord: Optional[str] = None, dateFormat: Optional[str] = None, timestampFormat: Optional[str] = None, multiLine: Union[bool, str, None] = None, allowUnquotedControlChars: Union[bool, str, None] = None, lineSep: Optional[str] = None, samplingRatio: Union[str, float, None] = None, dropFieldIfAllNull: Union[bool, str, None] = None, encoding: Optional[str] = None, locale: Optional[str] = None, pathGlobFilter: Union[bool, str, None] = None, recursiveFileLookup: Union[bool, str, None] = None, modifiedBefore: Union[bool, str, None] = None, modifiedAfter: Union[bool, str, None] = None, allowNonNumericNumbers: Union[bool, str, None] = None) → DataFrame¶ Loads JSON files and returns the results as a
DataFrame
.JSON Lines (newline-delimited JSON) is supported by default. For JSON (one record per file), set the
multiLine
parameter totrue
.If the
schema
parameter is not specified, this function goes through the input once to determine the input schema.- Parameters
- pathstr, list or
RDD
string represents path to the JSON dataset, or a list of paths, or RDD of Strings storing JSON objects.
- schema
pyspark.sql.types.StructType
or str, optional an optional
pyspark.sql.types.StructType
for the input schema or a DDL-formatted string (For examplecol0 INT, col1 DOUBLE
).
- pathstr, list or
- Other Parameters
- Extra options
For the extra options, refer to Data Source Option in the version you use.
Examples
>>> df1 = spark.read.json('python/test_support/sql/people.json') >>> df1.dtypes [('age', 'bigint'), ('name', 'string')] >>> rdd = sc.textFile('python/test_support/sql/people.json') >>> df2 = spark.read.json(rdd) >>> df2.dtypes [('age', 'bigint'), ('name', 'string')]