使用 HIVE 解析 json 数组 [英] Parse json arrays using HIVE

查看:104
本文介绍了使用 HIVE 解析 json 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个表 (jt) 中存储了许多 json 数组,如下所示:

I have many json arrays stored in a table (jt) that looks like this:

[{"ts":1403781896,"id":14,"log":"show"},{"ts":1403781896,"id":14,"log":"start"}]
[{"ts":1403781911,"id":14,"log":"press"},{"ts":1403781911,"id":14,"log":"press"}]

每个数组都是一条记录.

Each array is a record.

我想解析这个表以获得一个包含 3 个字段的新表(日志):ts、id、log.我尝试使用 get_json_object 方法,但似乎该方法与 json 数组不兼容,因为我只获取空值.

I would like to parse this table in order to get a new table (logs) with 3 fields: ts, id, log. I tried to use the get_json_object method, but it seems that method is not compatible with json arrays because I only get null values.

这是我测试过的代码:

CREATE TABLE logs AS 
SELECT get_json_object(jt.value, '$.ts') AS ts, 
get_json_object(jt.value, '$.id') AS id,
get_json_object(jt.value, '$.log') AS log
FROM jt;

我尝试使用其他功能,但它们看起来很复杂.谢谢!:)

I tried to use other functions but they seem really complicated. Thank you! :)

更新!我通过执行正则表达式解决了我的问题:

Update! I solved my issue by performing a regexp:

CREATE TABLE jt_reg AS
select regexp_replace(regexp_replace(value,'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') as valuereg  from jt;


CREATE TABLE logs AS 
SELECT get_json_object(jt_reg.valuereg, '$.ts') AS ts, 
get_json_object(jt_reg.valuereg, '$.id') AS id,
get_json_object(jt_reg.valuereg, '$.log') AS log
FROM ams_json_reg;

推荐答案

我刚遇到这个问题,JSON 数组作为字符串存储在 hive 表中.

I just ran into this problem, with the JSON array stored as a string in the hive table.

该解决方案有点笨拙和丑陋,但它有效并且不需要 serdes 或外部 UDF

The solution is a bit hacky and ugly, but it works and doesn't require serdes or external UDFs

SELECT 
       get_json_object(single_json_table.single_json, '$.ts') AS ts,
       get_json_object(single_json_table.single_json, '$.id') AS id,
       get_json_object(single_json_table.single_json, '$.log') AS log
FROM ( SELECT explode (
  split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
            '"}","', '"}",,,,"'), ',,,,')
      ) FROM src_table) single_json_table;

我把行拆开,这样会更容易阅读.我正在使用 substr() 去除第一个和最后一个字符,删除 [ 和 ] .然后我使用 regex_replace 来匹配 json 数组中记录之间的分隔符,并将分隔符添加或更改为独特的东西,然后可以轻松地与 split() 一起使用将字符串转换为 json 对象的 hive 数组,然后如上一个解决方案中所述,与explode() 一起使用.

I broke the lines up so that it would be a little easier to read. I'm using substr() to strip the first and last characters, removing [ and ] . I'm then using regex_replace to match the separator between records in the json array and adding or changing the separator to be something unique that can then be used easily with split() to turn the string into a hive array of json objects which can then be used with explode() as described in the previous solution.

请注意,此处使用的分隔符正则表达式 ( "}"," ) 不适用于原始数据集...正则表达式必须是 ( "},\{" ),然后需要替换是 "},,,,{" 例如..

Note, the separator regex used here ( "}"," ) wouldn't work with the original data set...the regex would have to be ( "},\{" ) and the replacement would then need to be "},,,,{" eg..

  split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
            '"},\\{"', '"},,,,{"'), ',,,,')

这篇关于使用 HIVE 解析 json 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆