pyspark.sql.functions.from_json¶
-
pyspark.sql.functions.
from_json
(col: ColumnOrName, schema: Union[pyspark.sql.types.ArrayType, pyspark.sql.types.StructType, pyspark.sql.column.Column, str], options: Optional[Dict[str, str]] = None) → pyspark.sql.column.Column¶ Parses a column containing a JSON string into a
MapType
withStringType
as keys type,StructType
orArrayType
with the specified schema. Returns null, in the case of an unparseable string.- Parameters
- col
Column
or str a column or column name in JSON format
- schema
DataType
or str a StructType, ArrayType of StructType or Python string literal with a DDL-formatted string to use when parsing the json column
- optionsdict, optional
options to control parsing. accepts the same options as the json datasource. See Data Source Option in the version you use.
- col
Examples
>>> from pyspark.sql.types import * >>> data = [(1, '''{"a": 1}''')] >>> schema = StructType([StructField("a", IntegerType())]) >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(from_json(df.value, schema).alias("json")).collect() [Row(json=Row(a=1))] >>> df.select(from_json(df.value, "a INT").alias("json")).collect() [Row(json=Row(a=1))] >>> df.select(from_json(df.value, "MAP<STRING,INT>").alias("json")).collect() [Row(json={'a': 1})] >>> data = [(1, '''[{"a": 1}]''')] >>> schema = ArrayType(StructType([StructField("a", IntegerType())])) >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(from_json(df.value, schema).alias("json")).collect() [Row(json=[Row(a=1)])] >>> schema = schema_of_json(lit('''{"a": 0}''')) >>> df.select(from_json(df.value, schema).alias("json")).collect() [Row(json=Row(a=None))] >>> data = [(1, '''[1, 2, 3]''')] >>> schema = ArrayType(IntegerType()) >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(from_json(df.value, schema).alias("json")).collect() [Row(json=[1, 2, 3])]