在Scala中使用from_json()解析一个DataFrame中的多个行 [英] Use from_json() in Scala to parse multiple Rows in a DataFrame
问题描述
我在Spark DataFrame的列中有一个JSON,如下所示:
I've a JSON within a Column of a Spark DataFrame as follows:
ID| Text| JSON
------------------------------------------------------------------------------
1| xyz| [{"Hour": 1, "Total": 10, "Fail": 1}, {"Hour": 2, "Total": 40, "Fail": 4}, {"Hour": 3, "Total": 20, "Fail": 2}]
我正在使用以下架构
val schema = StructType(Array(StructField("Hour", IntegerType),
StructField("Total", IntegerType), StructField("Fail", IntegerType))
我正在使用以下代码来解析DataFrame并输出JSON作为多列
I'm using following code to parse the DataFrame and output the JSON as multiple columns
val newDF = DF.withColumn("JSON", from_json(col("JSON"), schema)).select(col("JSON.*"))
newDF.show()
上面的代码只是解析JSON中的一条记录,但是,我希望它解析JSON中的所有记录。
The above code just parses the one single record from the JSON. But, I want it to parse all the records in the JSON.
输出如下:
Hour| Total| Fail|
-------------------------------
1| 10| 1|
-------------------------------
但是,我希望输出如下:
But, I want the output to be as follows:
Hour| Total| Fail|
-------------------------------
1| 10| 1|
2| 40| 4|
3| 20| 2|
-------------------------------
可以有人,请告诉我。
Can Someone, please let me know. What is it that I'm missing !!
预先感谢。
推荐答案
如果我正确解释了示例数据,则您的 JSON
列是带有已发布架构的JSON元素序列。在应用 from_json
之前,您需要爆炸该列,如下所示:
If I interpret your sample data correctly, your JSON
column is a sequence of JSON elements with your posted schema. You'll need to explode the column before applying from_json
as follows:
val df = Seq(
(1, "xyz", Seq("""{"Hour": 1, "Total": 10, "Fail": 1}""",
"""{"Hour": 2, "Total": 40, "Fail": 4}""",
"""{"Hour": 3, "Total": 20, "Fail": 2}""")
)).toDF("ID", "Text", "JSON")
import org.apache.spark.sql.types._
val jsonSchema = StructType(Array(
StructField("Hour", IntegerType),
StructField("Total", IntegerType),
StructField("Fail", IntegerType)
))
df.
withColumn("JSON", explode(col("JSON"))).
withColumn("JSON", from_json(col("JSON"), jsonSchema)).
select("JSON.*").
show
// +----+-----+----+
// |Hour|Total|Fail|
// +----+-----+----+
// | 1| 10| 1|
// | 2| 40| 4|
// | 3| 20| 2|
// +----+-----+----+
这篇关于在Scala中使用from_json()解析一个DataFrame中的多个行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!