如何在Hive中将JSON字符串数据类型列转换为映射数据类型列? [英] How to convert json string datatype column to map datatype column in hive?
问题描述
我需要从所有行中获取所有唯一键值. 每行具有不同的键和值.请找到该列的上图.
I need to get all the unique key values from all the rows. Each row has different keys and values Please find the above image of the column.
例如:一行看起来像
{"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585}
推荐答案
使用两行数据对此进行了测试,除第二个JSON中有一个另外的NEW_KEY
和PARSING.NGRAM_MATCHES_CACHED
值不同之外,所有key_value对均相同.
Tested this with two rows of data, all key_value pairs are identical except in second JSON there is one additional NEW_KEY
and PARSING.NGRAM_MATCHES_CACHED
values are different.
with data as
(
select stack(2, --data example
'{"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585}',
'{"NEW_KEY":12345,"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":154900280758}'
) as str
)
select str_to_map(concat_ws(',',collect_set(key_value)),',',':') --collect set, concatenate and convert to map
from
(
select explode(split(regexp_replace (str,'[{}"]',''),',')) key_value from data --remove JSON delimiters, split and explode pairs
)s;
结果:
OK
{"START_TIME":"1549002807568","PARSING.QUERY_FORMED":"1549002807586","CUBES_WITH_PERMISSIONS":"1549002807568","PARSING.CUBE_MATCH_SELECTED":"1549002807586","POTENTIAL_COMPLETIONS_ADDED":"1549002807587","QUERY_PARSED":"1549002807586","SUGGESTIONS_FORMED":"1549002807606","PARSING.SEQUENCES_GENERATED":"1549002807568","PARSING.NGRAM_MATCHES_CACHED":"154900280758","NEW_KEY":"12345"}
Time taken: 158.414 seconds, Fetched: 1 row(s)
当然,"PARSING.NGRAM_MATCHES_CACHED"
在结果中仅存在一次,因为map不允许同一键两次.所有键值都是唯一的.
请阅读代码中的注释.
Of course, "PARSING.NGRAM_MATCHES_CACHED"
exists only one time in the result, because map does not allow the same key twice. All key_values are unique.
Read comments in the code please.
这篇关于如何在Hive中将JSON字符串数据类型列转换为映射数据类型列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!