解析使用JSON HIVE阵列 [英] Parse json arrays using HIVE

查看:183
本文介绍了解析使用JSON HIVE阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

像这样

我有存储在一个表(JT)许多JSON数组:

  [{TS:1403781896,ID:14,日志:秀},{TS:1403781896,ID:14,日志:开始}]
[{TS:1403781911,ID:14,日志:preSS},{TS:1403781911,ID:14,日志:preSS} ]

每个阵列是一个纪录。

我想,为了得到一个新表(原木)与3个字段来解析这个表:TS,ID,登录。
我试图用get_json_objecy方法,但现在看来,这是不符合JSON阵列兼容的,因为我只得到空值。

这是code我已经测试:

  CREATE TABLE日志,
SELECT get_json_object(jt.value,'$ .TS')为TS,
get_json_object(jt.value,'$ .ID)为ID,
get_json_object(jt.value,'$ .LOG)AS日志
FROM JT;

我试图使用其他功能,但他们似乎真的很复杂。
谢谢! :)

更新!
我解决了我的问题通过执行一个正则表达式:

  CREATE TABLE jt_reg AS
选择REGEXP_REPLACE(REGEXP_REPLACE(价值,'\\\\} \\\\ \\\\ {','\\\\} \\\\\\ñ\\\\ {'),'\\\\ [| \\\\]','')从JT valuereg ;
CREATE TABLE AS日志
SELECT get_json_object(jt_reg.valuereg,'$ .TS')为TS,
get_json_object(jt_reg.valuereg,'$ .ID)为ID,
get_json_object(jt_reg.valuereg,'$ .LOG)AS日志
FROM ams_json_reg;


解决方案

使用爆炸()函数

 蜂巢(默认)> CREATE TABLE AS日志
                  > SELECT get_json_object(single_json_table.single_json,'$ .TS')为TS,
                  > get_json_object(single_json_table.single_json,'$ .ID)为ID,
                  > get_json_object(single_json_table.single_json,'$ .LOG)AS日志
                  >从
                  > (SELECT爆炸(json_array_col)作为single_json FROM JT)single_json_table;自动选择查询本地的唯一模式
马总preduce工作= 3
启动作业1开出3
的reduce任务数设置为0,因为没有减少操作员蜂巢(默认)>选择日志*;

TS ID日志
1403781896 14秀
1403781896 14开始
1403781911 14 preSS
1403781911 14 preSS
拍摄时间:0.118秒拿了4列(S)
蜂巢(默认)>

其中json_array_col是JT持有的jsons阵列列。

 蜂巢(默认)>选择JT json_array_col;
json_array_col
[{TS:1403781896,ID:14,日志:秀},{TS:1403781896,ID:14,日志:开始}]
[{TS:1403781911,ID:14,日志:preSS},{TS:1403781911,ID:14,日志:preSS}]

I have many json arrays stored in a table (jt) that looks like this:

[{"ts":1403781896,"id":14,"log":"show"},{"ts":1403781896,"id":14,"log":"start"}]
[{"ts":1403781911,"id":14,"log":"press"},{"ts":1403781911,"id":14,"log":"press"}]

Each array is a record.

I would like to parse this table in order to get a new table (logs) with 3 fields: ts, id, log. I tried to use the get_json_objecy method but it seems it is not compatible with json arrays because I only get null values.

This is the code I have tested:

CREATE TABLE logs AS 
SELECT get_json_object(jt.value, '$.ts') AS ts, 
get_json_object(jt.value, '$.id') AS id,
get_json_object(jt.value, '$.log') AS log
FROM jt;

I tried to use other functions but they seem really complicated. Thank you! :)

Update! I solved my issue by performing a regexp:

CREATE TABLE jt_reg AS
select regexp_replace(regexp_replace(value,'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') as valuereg  from jt;


CREATE TABLE logs AS 
SELECT get_json_object(jt_reg.valuereg, '$.ts') AS ts, 
get_json_object(jt_reg.valuereg, '$.id') AS id,
get_json_object(jt_reg.valuereg, '$.log') AS log
FROM ams_json_reg;

解决方案

Use explode() function

 hive (default)> CREATE TABLE logs AS
                  >   SELECT get_json_object(single_json_table.single_json, '$.ts') AS ts,
                  >   get_json_object(single_json_table.single_json, '$.id') AS id,
                  >   get_json_object(single_json_table.single_json, '$.log') AS log
                  >   FROM
                  >     (SELECT explode(json_array_col) as single_json FROM jt) single_json_table ;

Automatically selecting local only mode for query
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator

hive (default)> select * from logs;
OK
ts      id      log
1403781896      14      show
1403781896      14      start
1403781911      14      press
1403781911      14      press
Time taken: 0.118 seconds, Fetched: 4 row(s)
hive (default)>

where json_array_col is column in jt which holds your array of jsons.

hive (default)> select json_array_col from jt;
json_array_col
["{"ts":1403781896,"id":14,"log":"show"}","{"ts":1403781896,"id":14,"log":"start"}"]
["{"ts":1403781911,"id":14,"log":"press"}","{"ts":1403781911,"id":14,"log":"press"}"]

这篇关于解析使用JSON HIVE阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆