从CSV文件中拆分JSON值并基于Spark/Scala中的JSON键创建新列 [英] Split JSON value from CSV file and create new column based on json key in Spark/Scala

查看:165
本文介绍了从CSV文件中拆分JSON值并基于Spark/Scala中的JSON键创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在CSV文件中具有以下格式的数据.想要从Desc列中拆分JSON并使用key创建一个新列.将spark 2与Sc​​ala结合使用.

Have data in CSV file below is the format. Want to split JSON from Desc column and create a new column with key.Using spark 2 with Scala.

+------+------------+----------------------------------+
|  id  |  Category  |           Desc                   |
+------+------------+----------------------------------+
|  201 |  MIS20     | { "Total": 200,"Defective": 21 } |
+------+-----------------------------------------------+
|  202 |  MIS30     | { "Total": 740,"Defective": 58 } |
+------+-----------------------------------------------+

输出:

So the desired output would be:

+------+------------+---------+-------------+
|  id  |  Category  |  Total  |  Defective  |
+------+------------+---------+-------------+
|  201 |  MIS20     |  200    |   21        |
+------+----------------------+-------------+
|  202 |  MIS30     |  740    |   58        | 
+------+------------------------------------+

我们非常感谢您的帮助.

Any help is highly appreciated.

推荐答案

为内部json创建一个schema,并使用下面的from_json函数应用该架构

Create a schema for your inner json and apply that schema with from_json function as below

val schema = new StructType()
  .add(StructField("Total", LongType, false)).
  add("Defective", LongType, false)

d.select($"id",$"Category", from_json($"Desc", schema).as("desc"))
  .select($"id",$"Category", $"desc.*")
  .show(false)

输出:

+---+--------+-----+---------+
|id |Category|Total|Defective|
+---+--------+-----+---------+
|201|MIS20   |200  |21       |
|202|MIS30   |740  |58       |
+---+--------+-----+---------+

希望这会有所帮助!

这篇关于从CSV文件中拆分JSON值并基于Spark/Scala中的JSON键创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆