导入json数组到蜂巢 [英] importing json array into hive

查看:81
本文介绍了导入json数组到蜂巢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在配置单元中导入以下json

I'm trying to import the following json in hive

[{"time":1521115600,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":21.70905,"pm25":16.5,"pm10":14.60085,"gas1" :0,"gas2":0.12,"gas3":0,"gas4":0,温度":null,压力":0,湿度":0,噪音":0},{时间" :1521115659,纬度":44.3959,经度":26.1025,海拔":53,"pm1":24.34045,"pm25":18.5,"pm10":16.37065,"gas1":0,"gas2":0.08 ,"gas3":0,"gas4":0,温度":无,压力":0,湿度":0,噪声":0},{时间":1521115720,纬度":44.3959 ,经度":26.1025,海拔":53,"pm1":23.6826,"pm25":18,"pm10":15.9282,"gas1":0,"gas2":0,"gas3":0," gas4:0,"温度:无,"压力:0,"湿度:0,"噪音:0},{"时间:1521115779,"纬度:44.3959,"经度:26.1025,"高度:53," pm1:25.65615," pm25:19.5," pm10:17.25555," gas1:0," gas2:0.04," gas3:0," gas4:0,"温度":null,压力":0,湿度":0,噪音":0}]

[{"time":1521115600,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":21.70905,"pm25":16.5,"pm10":14.60085,"gas1":0,"gas2":0.12,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0},{"time":1521115659,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":24.34045,"pm25":18.5,"pm10":16.37065,"gas1":0,"gas2":0.08,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0},{"time":1521115720,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":23.6826,"pm25":18,"pm10":15.9282,"gas1":0,"gas2":0,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0},{"time":1521115779,"latitude":44.3959,"longitude":26.1025,"altitude":53,"pm1":25.65615,"pm25":19.5,"pm10":17.25555,"gas1":0,"gas2":0.04,"gas3":0,"gas4":0,"temperature":null,"pressure":0,"humidity":0,"noise":0}]

CREATE TABLE json_serde (
 s array<struct<time: timestamp, latitude: string, longitude: string, pm1: string>>)
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES (
     'mapping.value' = 'value'
 )
 STORED AS TEXTFILE
location '/user/hduser';

导入有效,但如果我尝试

the import works but if i try

Select * from json_serde;

它将仅从hadoop/user/hduser上的每个文档中返回每个文件的第一个元素.

it will return from every document that is on hadoop/user/hduser only the first element per file.

关于使用json数组有很好的文档?

there is a good documentation on working with json array??

推荐答案

如果无法使用更新输入文件格式,则可以在数据完成后直接导入spark并使用它,一旦数据确定后再写回Hive表.

If you can not use update your input file format you can directly import in spark and use it, once data is finalized write back to Hive table.

scala> val myjs = spark.read.format("json").option("path","file:///root/tmp/test5").load()
myjs: org.apache.spark.sql.DataFrame = [altitude: bigint, gas1: bigint ... 13 more fields]

scala> myjs.show()
+--------+----+----+----+----+--------+--------+---------+-----+--------+--------+----+--------+-----------+----------+
|altitude|gas1|gas2|gas3|gas4|humidity|latitude|longitude|noise|     pm1|    pm10|pm25|pressure|temperature|      time|
+--------+----+----+----+----+--------+--------+---------+-----+--------+--------+----+--------+-----------+----------+
|      53|   0|0.12|   0|   0|       0| 44.3959|  26.1025|    0|21.70905|14.60085|16.5|       0|       null|1521115600|
|      53|   0|0.08|   0|   0|       0| 44.3959|  26.1025|    0|24.34045|16.37065|18.5|       0|       null|1521115659|
|      53|   0| 0.0|   0|   0|       0| 44.3959|  26.1025|    0| 23.6826| 15.9282|18.0|       0|       null|1521115720|
|      53|   0|0.04|   0|   0|       0| 44.3959|  26.1025|    0|25.65615|17.25555|19.5|       0|       null|1521115779|
+--------+----+----+----+----+--------+--------+---------+-----+--------+--------+----+--------+-----------+----------+


scala> myjs.write.json("file:///root/tmp/test_output")

或者,您也可以直接配置表格

Alternatively you can directly hive table

   scala> myjs.createOrReplaceTempView("myjs")

    scala> spark.sql("select * from myjs").show()

    scala> spark.sql("create table tax.myjs_hive as select * from myjs")

这篇关于导入json数组到蜂巢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆