如何在Hive中将JSON字符串数据类型列转换为映射数据类型列? [英] How to convert json string datatype column to map datatype column in hive?

查看:1693
本文介绍了如何在Hive中将JSON字符串数据类型列转换为映射数据类型列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从所有行中获取所有唯一键值. 每行具有不同的键和值.请找到该列的上图.

I need to get all the unique key values from all the rows. Each row has different keys and values Please find the above image of the column.

例如:一行看起来像

{"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585}

推荐答案

使用两行数据对此进行了测试,除第二个JSON中有一个另外的NEW_KEYPARSING.NGRAM_MATCHES_CACHED值不同之外,所有key_value对均相同.

Tested this with two rows of data, all key_value pairs are identical except in second JSON there is one additional NEW_KEY and PARSING.NGRAM_MATCHES_CACHED values are different.

with data as
(
select stack(2, --data example
'{"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585}',
'{"NEW_KEY":12345,"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":154900280758}'
) as str
)

select str_to_map(concat_ws(',',collect_set(key_value)),',',':') --collect set, concatenate and convert to map
from
(
select explode(split(regexp_replace (str,'[{}"]',''),',')) key_value from data --remove JSON delimiters, split and explode pairs
)s;

结果:

OK
{"START_TIME":"1549002807568","PARSING.QUERY_FORMED":"1549002807586","CUBES_WITH_PERMISSIONS":"1549002807568","PARSING.CUBE_MATCH_SELECTED":"1549002807586","POTENTIAL_COMPLETIONS_ADDED":"1549002807587","QUERY_PARSED":"1549002807586","SUGGESTIONS_FORMED":"1549002807606","PARSING.SEQUENCES_GENERATED":"1549002807568","PARSING.NGRAM_MATCHES_CACHED":"154900280758","NEW_KEY":"12345"}
Time taken: 158.414 seconds, Fetched: 1 row(s)

当然,"PARSING.NGRAM_MATCHES_CACHED"在结果中仅存在一次,因为map不允许同一键两次.所有键值都是唯一的. 请阅读代码中的注释.

Of course, "PARSING.NGRAM_MATCHES_CACHED" exists only one time in the result, because map does not allow the same key twice. All key_values are unique. Read comments in the code please.

这篇关于如何在Hive中将JSON字符串数据类型列转换为映射数据类型列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆