如何在spark scala中将json字符串解析为不同的列? [英] How to parse json string to different columns in spark scala?
问题描述
读取镶木地板文件时,这是以下文件数据
While reading parquet file this is the following file data
|id |name |activegroup|
|1 |abc |[{"groupID":"5d","role":"admin","status":"A"},{"groupID":"58","role":"admin","status":"A"}]|
各个字段的数据类型
根
|--id : int
|--name : String
|--activegroup : String
activegroup 列是字符串爆炸功能不起作用.以下是所需的输出
activegroup column is string explode function is not working. Following is the required output
|id |name |groupID|role|status|
|1 |abc |5d |admin|A |
|1 |def |58 |admin|A |
请帮我在 spark scala 最新版本中解析以上内容
Do help me with parsing the above in spark scala latest version
推荐答案
首先需要提取json模式:
First you need to extract the json schema:
val schema = schema_of_json(lit(df.select($"activeGroup").as[String].first))
一旦你得到它,你就可以将你的 activegroup 列,它是一个 String 到 json (from_json
),然后 explode
它.
Once you got it, you can convert your activegroup column, which is a String to json (from_json
), and then explode
it.
一旦该列是一个 json,您就可以使用 $"columnName.field"
Once the column is a json, you can extract it's values with $"columnName.field"
val dfresult = df.withColumn("jsonColumn", explode(
from_json($"activegroup", schema)))
.select($"id", $"name",
$"jsonColumn.groupId" as "groupId",
$"jsonColumn.role" as "role",
$"jsonColumn.status" as "status")
如果你想提取整个 json 并且元素名称对你来说没问题,你可以使用 * 来做:
If you want to extract the whole json and the element names are ok to you you can use the * to do it:
val dfresult = df.withColumn("jsonColumn", explode(
from_json($"activegroup", schema)))
.select($"id", $"name", $"jsonColumn.*")
结果
+---+----+-------+-----+------+
| id|name|groupId| role|status|
+---+----+-------+-----+------+
| 1| abc| 5d|admin| A|
| 1| abc| 58|admin| A|
+---+----+-------+-----+------+
这篇关于如何在spark scala中将json字符串解析为不同的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!