从CSV文件中拆分JSON值并基于Spark/Scala中的JSON键创建新列 [英] Split JSON value from CSV file and create new column based on json key in Spark/Scala
本文介绍了从CSV文件中拆分JSON值并基于Spark/Scala中的JSON键创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在CSV文件中具有以下格式的数据.想要从Desc
列中拆分JSON并使用key创建一个新列.将spark 2与Scala结合使用.
Have data in CSV file below is the format. Want to split JSON from Desc
column and create a new column with key.Using spark 2 with Scala.
+------+------------+----------------------------------+
| id | Category | Desc |
+------+------------+----------------------------------+
| 201 | MIS20 | { "Total": 200,"Defective": 21 } |
+------+-----------------------------------------------+
| 202 | MIS30 | { "Total": 740,"Defective": 58 } |
+------+-----------------------------------------------+
输出:
So the desired output would be:
+------+------------+---------+-------------+
| id | Category | Total | Defective |
+------+------------+---------+-------------+
| 201 | MIS20 | 200 | 21 |
+------+----------------------+-------------+
| 202 | MIS30 | 740 | 58 |
+------+------------------------------------+
我们非常感谢您的帮助.
Any help is highly appreciated.
推荐答案
为内部json
创建一个schema
,并使用下面的from_json
函数应用该架构
Create a schema
for your inner json
and apply that schema with from_json
function as below
val schema = new StructType()
.add(StructField("Total", LongType, false)).
add("Defective", LongType, false)
d.select($"id",$"Category", from_json($"Desc", schema).as("desc"))
.select($"id",$"Category", $"desc.*")
.show(false)
输出:
+---+--------+-----+---------+
|id |Category|Total|Defective|
+---+--------+-----+---------+
|201|MIS20 |200 |21 |
|202|MIS30 |740 |58 |
+---+--------+-----+---------+
希望这会有所帮助!
这篇关于从CSV文件中拆分JSON值并基于Spark/Scala中的JSON键创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文