如何将每行JSON解析为Spark 2 DataFrame的列? [英] How to parse each row JSON to columns of Spark 2 DataFrame?
本文介绍了如何将每行JSON解析为Spark 2 DataFrame的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在我的Spark(2.2)DataFrame中,每一行都是JSON:
In my Spark (2.2) DataFrame each row is JSON:
df.head()
//output
//[{"key":"111","event_name":"page-visited","timestamp":1517814315}]
df.show()
//output
//+--------------+
//| value|
//+--------------+
//|{"key":"111...|
//|{"key":"222...|
我想将每个JSON行传递给各列,以获取此result
:
I want to pass each JSON row to columns in order to get this result
:
key event_name timestamp
111 page-visited 1517814315
...
我尝试了这种方法,但是并没有给我预期的结果:
I tried this approach, but it does not give me an expected result:
import org.apache.spark.sql.functions.from_json
import org.apache.spark.sql.types._
val schema = StructType(Seq(
StructField("key", StringType, true), StructField("event_name", StringType, true), StructField("timestamp", IntegerType, true)
))
val result = df.withColumn("value", from_json($"value", schema))
和:
result.printSchema()
root
|-- value: struct (nullable = true)
| |-- key: string (nullable = true)
| |-- event_name: string (nullable = true)
| |-- timestamp: integer (nullable = true)
应为:
result.printSchema()
root
|-- key: string (nullable = true)
|-- event_name: string (nullable = true)
|-- timestamp: integer (nullable = true)
推荐答案
您可以最后使用select($"value.*")
来将struct
列的元素选择为单独的列
You can use select($"value.*")
in the end to select the elements of struct
column into separate columns as
val result = df.withColumn("value", from_json($"value", schema)).select($"value.*")
这篇关于如何将每行JSON解析为Spark 2 DataFrame的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文