导入json数组到蜂巢 [英] importing json array into hive
问题描述
我正在尝试在配置单元中导入以下json
I'm trying to import the following json in hive
[{"time":1521115600,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":21.70905,"pm25":16.5,"pm10":14.60085,"gas1" :0,"gas2":0.12,"gas3":0,"gas4":0,温度":null,压力":0,湿度":0,噪音":0},{时间" :1521115659,纬度":44.3959,经度":26.1025,海拔":53,"pm1":24.34045,"pm25":18.5,"pm10":16.37065,"gas1":0,"gas2":0.08 ,"gas3":0,"gas4":0,温度":无,压力":0,湿度":0,噪声":0},{时间":1521115720,纬度":44.3959 ,经度":26.1025,海拔":53,"pm1":23.6826,"pm25":18,"pm10":15.9282,"gas1":0,"gas2":0,"gas3":0," gas4:0,"温度:无,"压力:0,"湿度:0,"噪音:0},{"时间:1521115779,"纬度:44.3959,"经度:26.1025,"高度:53," pm1:25.65615," pm25:19.5," pm10:17.25555," gas1:0," gas2:0.04," gas3:0," gas4:0,"温度":null,压力":0,湿度":0,噪音":0}]
[{"time":1521115600,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":21.70905,"pm25":16.5,"pm10":14.60085,"gas1":0,"gas2":0.12,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0},{"time":1521115659,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":24.34045,"pm25":18.5,"pm10":16.37065,"gas1":0,"gas2":0.08,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0},{"time":1521115720,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":23.6826,"pm25":18,"pm10":15.9282,"gas1":0,"gas2":0,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0},{"time":1521115779,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":25.65615,"pm25":19.5,"pm10":17.25555,"gas1":0,"gas2":0.04,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0}]
CREATE TABLE json_serde (
s array<struct<time: timestamp, latitude: string, longitude: string, pm1: string>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'mapping.value' = 'value'
)
STORED AS TEXTFILE
location '/user/hduser';
导入有效,但如果我尝试
the import works but if i try
Select * from json_serde;
它将仅从hadoop/user/hduser上的每个文档中返回每个文件的第一个元素.
it will return from every document that is on hadoop/user/hduser only the first element per file.
关于使用json数组有很好的文档?
there is a good documentation on working with json array??
推荐答案
如果无法使用更新输入文件格式,则可以在数据完成后直接导入spark并使用它,一旦数据确定后再写回Hive表.
If you can not use update your input file format you can directly import in spark and use it, once data is finalized write back to Hive table.
scala> val myjs = spark.read.format("json").option("path","file:///root/tmp/test5").load()
myjs: org.apache.spark.sql.DataFrame = [altitude: bigint, gas1: bigint ... 13 more fields]
scala> myjs.show()
+--------+----+----+----+----+--------+--------+---------+-----+--------+--------+----+--------+-----------+----------+
|altitude|gas1|gas2|gas3|gas4|humidity|latitude|longitude|noise| pm1| pm10|pm25|pressure|temperature| time|
+--------+----+----+----+----+--------+--------+---------+-----+--------+--------+----+--------+-----------+----------+
| 53| 0|0.12| 0| 0| 0| 44.3959| 26.1025| 0|21.70905|14.60085|16.5| 0| null|1521115600|
| 53| 0|0.08| 0| 0| 0| 44.3959| 26.1025| 0|24.34045|16.37065|18.5| 0| null|1521115659|
| 53| 0| 0.0| 0| 0| 0| 44.3959| 26.1025| 0| 23.6826| 15.9282|18.0| 0| null|1521115720|
| 53| 0|0.04| 0| 0| 0| 44.3959| 26.1025| 0|25.65615|17.25555|19.5| 0| null|1521115779|
+--------+----+----+----+----+--------+--------+---------+-----+--------+--------+----+--------+-----------+----------+
scala> myjs.write.json("file:///root/tmp/test_output")
或者,您也可以直接配置表格
Alternatively you can directly hive table
scala> myjs.createOrReplaceTempView("myjs")
scala> spark.sql("select * from myjs").show()
scala> spark.sql("create table tax.myjs_hive as select * from myjs")
这篇关于导入json数组到蜂巢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!