如何使用Spark解析jsonfile [英] How to parse jsonfile with spark
本文介绍了如何使用Spark解析jsonfile的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个要解析的json文件.json格式如下:
I have a jsonfile to be parsed.The json format is like this :
{"cv_id":"001","cv_parse": { "educations": [{"major": "English", "degree": "Bachelor" },{"major": "English", "degree": "Master "}],"basic_info": { "birthyear": "1984", "location": {"state": "New York"}}}}
我必须获取文件中的每个单词.如何从数组中获取"major"
,是否必须使用方法df.select("cv_parse.basic_info.location.province")
获取"province"一词?
I have to get every word in the file.How can I get the "major"
from an array and do I have to get the word of "province" using the method df.select("cv_parse.basic_info.location.province")
?
这是我想要的结果:
cv_id major degree birthyear state
001 English Bachelor 1984 New York
001 English Master 1984 New York
推荐答案
这可能不是最好的方法,但是您可以试一下.
This might not be the best way of doing it but you can give it a shot.
// import the implicits functions
import org.apache.spark.sql.functions._
import sqlContext.implicits._
//read the json file
val jsonDf = sqlContext.read.json("sample-data/sample.json")
jsonDf.printSchema
您的架构为:
root
|-- cv_id: string (nullable = true)
|-- cv_parse: struct (nullable = true)
| |-- basic_info: struct (nullable = true)
| | |-- birthyear: string (nullable = true)
| | |-- location: struct (nullable = true)
| | | |-- state: string (nullable = true)
| |-- educations: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- degree: string (nullable = true)
| | | |-- major: string (nullable = true)
现在您需要爆炸educations
列
val explodedResult = jsonDf.select($"cv_id", explode($"cv_parse.educations"),
$"cv_parse.basic_info.birthyear", $"cv_parse.basic_info.location.state")
explodedResult.printSchema
现在您的架构将是
root
|-- cv_id: string (nullable = true)
|-- col: struct (nullable = true)
| |-- degree: string (nullable = true)
| |-- major: string (nullable = true)
|-- birthyear: string (nullable = true)
|-- state: string (nullable = true)
现在您可以选择列
explodedResult.select("cv_id", "birthyear", "state", "col.degree", "col.major").show
+-----+---------+--------+--------+-------+
|cv_id|birthyear| state| degree| major|
+-----+---------+--------+--------+-------+
| 001| 1984|New York|Bachelor|English|
| 001| 1984|New York| Master |English|
+-----+---------+--------+--------+-------+
这篇关于如何使用Spark解析jsonfile的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文