火花:如何使用Spark解析JSON对象数组 [英] SPARK: How to parse a Array of JSON object using Spark

查看：184 发布时间：2021/2/13 20:14:20 json apache-spark apache-spark-sql schema

本文介绍了火花:如何使用Spark解析JSON对象数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含普通列的文件，该列包含一个Json字符串，如下所示.还附有图片.实际上，每一行都属于一个名为Demo的列(在图片中不可见).其他列已被删除并且在图片中不可见，因为它们现在不再需要关注.

I have a file with normal columns and a column that contains a Json string which is as below. Also picture attached. Each row actually belongs to a column named Demo(not Visible in pic).The other columns are removed and not visible in pic because they are not of concern for now.

[{"key":"device_kind","value":"desktop"},{"key":"country_code","value":"ID"},{"key":"device_platform","value":"windows"}]

请不要更改JSON的格式，因为它与数据文件中的JSON相同，除非所有内容都在一行中.

Please do not change the format of the JSON since it is as above in the data file except everything is in one line.

每行在JSON列下都有一个这样的对象.这些对象全部在一行中但在一个数组中.我想使用spark解析此列并访问其中的每个对象的值.请帮忙.

Each row has one such object under column say JSON. The objects are all in one line but in a array.I would like to Parse this column using spark and access he value of each object inside. Please help.

我想要获得键值"的值.我的目标是从每个JSON对象中提取值"键的值到单独的列中.

What I want is to get value of key "value". My objective is to extract value of "value" key from each JSON object into separate columns.

我尝试使用get_json_object.它适用于以下1)Json字符串，但对于JSON返回null 2)

I tried using get_json_object. It works for the following 1) Json string but returns null for the JSON 2)

{键":"device_kind"，值":桌面"}
[{"key":"device_kind"，"value":"desktop"}，{"key":"country_code"，"value":"ID"}，{"key":"device_platform"，值":"windows"}]

我尝试过的代码如下

val jsonDF1 = spark.range(1).selectExpr(""" '{"key":"device_kind","value":"desktop"}' as jsonString""")

jsonDF1.select(get_json_object(col("jsonString"), "$.value") as "device_kind").show(2)// prints desktop under column named device_kind

val jsonDF2 = spark.range(1).selectExpr(""" '[{"key":"device_kind","value":"desktop"},{"key":"country_code","value":"ID"},{"key":"device_platform","value":"windows"}]' as jsonString""")

jsonDF2.select(get_json_object(col("jsonString"), "$.[0].value") as "device_kind").show(2)// print null but expected is desktop under column named device_kind

接下来，我想使用from_Json，但是我无法弄清楚如何为JSON对象数组构建架构.我发现的所有示例都是嵌套JSON对象的示例，但与上述JSON字符串没有任何相似之处.

Next I wanted to use from_Json but I am unable to figure out how to build schema for Array of JSON objects. All examples I find are that of nested JSON objects but nothing similar to the above JSON string.

我确实发现在sparkR 2.2中，from_Json如果设置为true则具有布尔参数，它将处理上述类型的JSON字符串，即JSON对象数组，但该选项在Spark-Scala 2.3.3中不可用

I did find that in sparkR 2.2 from_Json has a boolean parameter if set true it will handle the above type of JSON string i.e Array of JSON objects but that option is not available in Spark-Scala 2.3.3

要明确输入和预期输出，应如下所示.

To be clear on input and expected output it should be as below.

i/p以下

+------------------------------------------------------------------------+
|Demographics                                                            |
+------------------------------------------------------------------------+
|[[device_kind, desktop], [country_code, ID], [device_platform, windows]]|
|[[device_kind, mobile], [country_code, BE], [device_platform, android]] |
|[[device_kind, mobile], [country_code, QA], [device_platform, android]] |
+------------------------------------------------------------------------+

预期下面的o/p

+------------------------------------------------------------------------+-----------+------------+---------------+
|Demographics                                                            |device_kind|country_code|device_platform|
+------------------------------------------------------------------------+-----------+------------+---------------+
|[[device_kind, desktop], [country_code, ID], [device_platform, windows]]|desktop    |ID          |windows        |
|[[device_kind, mobile], [country_code, BE], [device_platform, android]] |mobile     |BE          |android        |
|[[device_kind, mobile], [country_code, QA], [device_platform, android]] |mobile     |QA          |android        |
+------------------------------------------------------------------------+-----------+------------+---------------+

火花:如何使用Spark解析JSON对象数组 [英] SPARK: How to parse a Array of JSON object using Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

火花:如何使用Spark解析JSON对象数组 [英] SPARK: How to parse a Array of JSON object using Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭