如何从Hive中的json字符串中提取选定的值 [英] How to extract selected values from json string in Hive
问题描述
我正在Hive中运行一个简单的查询,该查询会产生以下输出(以及其他一些附加列.
|------|-----------------------------------------------------------|
| col1 | col2 |
|------|-----------------------------------------------------------|
| A | {"variable1":123,"variable2":456,"variable3":789} |
|------|-----------------------------------------------------------|
| B | {"variable1":222,"variable2":333,"variable3":444} |
--------------------------------------------------------------------
我需要能够解析json字符串并在SELECT语句本身期间提取每个令牌的值,以便我可以合并WHERE语句以仅返回字符串中对我有价值的部分. /p>
所以我的最终输出可能是这样的:
|------------------------------------------|
| col1 |variable1 | variable2 | variable3 |
|------------------------------------------|
| A | 123 | 456 | 789 |
|------------------------------------------|
| B | 222 | 333 | 444 |
--------------------------------------------
我尝试使用各种功能,包括使用esnaples中指定的参数结构来包括SPLIT和GET_JSON_OBJECT,但所有返回错误都诸如此类:
No matching method for class org.apache.hadoop.hive.ql.udf.UDFJson
with (struct<...>, string). Possible choices: _FUNC_(string, string)
有人可以告诉我我想做的事是否可行,或者解释我要去哪里错了?
预先感谢
select col1, get_json_object(col2,'$.variable1') as variable1,
get_json_object(col2,'$.variable2') as variable2,
get_json_object(col2,'$.variable3') as variable3
from json_test
如果将输出放入表中(例如json_test),则可以通过这种方式进行解析.您也可以调整查询以获得这些结果.
输出:
col1 |variable1 |variable2 |variable3 |
-----|----------|----------|----------|
A |123 |456 |789 |
B |222 |333 |444 |
I am running a simple query in Hive that produces the following output (with a few other additional columns.
|------|-----------------------------------------------------------|
| col1 | col2 |
|------|-----------------------------------------------------------|
| A | {"variable1":123,"variable2":456,"variable3":789} |
|------|-----------------------------------------------------------|
| B | {"variable1":222,"variable2":333,"variable3":444} |
--------------------------------------------------------------------
I need to be able to parse the json string and pull out the values for each token during the SELECT statement itself so that I can perhaps incorporate a WHERE statement to return only the parts of the string that are valuable to me.
So my ultimate output might look like this:
|------------------------------------------|
| col1 |variable1 | variable2 | variable3 |
|------------------------------------------|
| A | 123 | 456 | 789 |
|------------------------------------------|
| B | 222 | 333 | 444 |
--------------------------------------------
I have tried using various functions in including SPLIT and GET_JSON_OBJECT using the argument structure specified in the esnaples yet all return errors such as:
No matching method for class org.apache.hadoop.hive.ql.udf.UDFJson
with (struct<...>, string). Possible choices: _FUNC_(string, string)
Could someone please tell if what I am trying to do is feasible, or explain where I am going wrong?
Thanks in advance
select col1, get_json_object(col2,'$.variable1') as variable1,
get_json_object(col2,'$.variable2') as variable2,
get_json_object(col2,'$.variable3') as variable3
from json_test
If you put your output into a table (say json_test), you can parse in this way. You can tweak your query too to obtain these results.
Output:
col1 |variable1 |variable2 |variable3 |
-----|----------|----------|----------|
A |123 |456 |789 |
B |222 |333 |444 |
这篇关于如何从Hive中的json字符串中提取选定的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!