解析使用JSON HIVE阵列 [英] Parse json arrays using HIVE
问题描述
我有存储在一个表(JT)许多JSON数组:
[{TS:1403781896,ID:14,日志:秀},{TS:1403781896,ID:14,日志:开始}]
[{TS:1403781911,ID:14,日志:preSS},{TS:1403781911,ID:14,日志:preSS} ]
每个阵列是一个纪录。
我想,为了得到一个新表(原木)与3个字段来解析这个表:TS,ID,登录。
我试图用get_json_objecy方法,但现在看来,这是不符合JSON阵列兼容的,因为我只得到空值。
这是code我已经测试:
CREATE TABLE日志,
SELECT get_json_object(jt.value,'$ .TS')为TS,
get_json_object(jt.value,'$ .ID)为ID,
get_json_object(jt.value,'$ .LOG)AS日志
FROM JT;
我试图使用其他功能,但他们似乎真的很复杂。
谢谢! :)
更新!
我解决了我的问题通过执行一个正则表达式:
CREATE TABLE jt_reg AS
选择REGEXP_REPLACE(REGEXP_REPLACE(价值,'\\\\} \\\\ \\\\ {','\\\\} \\\\\\ñ\\\\ {'),'\\\\ [| \\\\]','')从JT valuereg ;
CREATE TABLE AS日志
SELECT get_json_object(jt_reg.valuereg,'$ .TS')为TS,
get_json_object(jt_reg.valuereg,'$ .ID)为ID,
get_json_object(jt_reg.valuereg,'$ .LOG)AS日志
FROM ams_json_reg;
使用爆炸()函数
蜂巢(默认)> CREATE TABLE AS日志
> SELECT get_json_object(single_json_table.single_json,'$ .TS')为TS,
> get_json_object(single_json_table.single_json,'$ .ID)为ID,
> get_json_object(single_json_table.single_json,'$ .LOG)AS日志
>从
> (SELECT爆炸(json_array_col)作为single_json FROM JT)single_json_table;自动选择查询本地的唯一模式
马总preduce工作= 3
启动作业1开出3
的reduce任务数设置为0,因为没有减少操作员蜂巢(默认)>选择日志*;
好
TS ID日志
1403781896 14秀
1403781896 14开始
1403781911 14 preSS
1403781911 14 preSS
拍摄时间:0.118秒拿了4列(S)
蜂巢(默认)>
其中json_array_col是JT持有的jsons阵列列。
蜂巢(默认)>选择JT json_array_col;
json_array_col
[{TS:1403781896,ID:14,日志:秀},{TS:1403781896,ID:14,日志:开始}]
[{TS:1403781911,ID:14,日志:preSS},{TS:1403781911,ID:14,日志:preSS}]
I have many json arrays stored in a table (jt) that looks like this:
[{"ts":1403781896,"id":14,"log":"show"},{"ts":1403781896,"id":14,"log":"start"}]
[{"ts":1403781911,"id":14,"log":"press"},{"ts":1403781911,"id":14,"log":"press"}]
Each array is a record.
I would like to parse this table in order to get a new table (logs) with 3 fields: ts, id, log. I tried to use the get_json_objecy method but it seems it is not compatible with json arrays because I only get null values.
This is the code I have tested:
CREATE TABLE logs AS
SELECT get_json_object(jt.value, '$.ts') AS ts,
get_json_object(jt.value, '$.id') AS id,
get_json_object(jt.value, '$.log') AS log
FROM jt;
I tried to use other functions but they seem really complicated. Thank you! :)
Update! I solved my issue by performing a regexp:
CREATE TABLE jt_reg AS
select regexp_replace(regexp_replace(value,'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') as valuereg from jt;
CREATE TABLE logs AS
SELECT get_json_object(jt_reg.valuereg, '$.ts') AS ts,
get_json_object(jt_reg.valuereg, '$.id') AS id,
get_json_object(jt_reg.valuereg, '$.log') AS log
FROM ams_json_reg;
Use explode() function
hive (default)> CREATE TABLE logs AS
> SELECT get_json_object(single_json_table.single_json, '$.ts') AS ts,
> get_json_object(single_json_table.single_json, '$.id') AS id,
> get_json_object(single_json_table.single_json, '$.log') AS log
> FROM
> (SELECT explode(json_array_col) as single_json FROM jt) single_json_table ;
Automatically selecting local only mode for query
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
hive (default)> select * from logs;
OK
ts id log
1403781896 14 show
1403781896 14 start
1403781911 14 press
1403781911 14 press
Time taken: 0.118 seconds, Fetched: 4 row(s)
hive (default)>
where json_array_col is column in jt which holds your array of jsons.
hive (default)> select json_array_col from jt;
json_array_col
["{"ts":1403781896,"id":14,"log":"show"}","{"ts":1403781896,"id":14,"log":"start"}"]
["{"ts":1403781911,"id":14,"log":"press"}","{"ts":1403781911,"id":14,"log":"press"}"]
这篇关于解析使用JSON HIVE阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!