如何使用Hive查询结构数组(get_json_object)? [英] How to query struct array with Hive (get_json_object)?

查看:2620
本文介绍了如何使用Hive查询结构数组(get_json_object)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将以下JSON对象存储在Hive表中:

  {
main_id:qwert ,
features:[
{
scope:scope1,
name:foo,
value:ab12345 ,
age:50,
somelist:[abcde,fghij]
},
{
scope:scope2 ,
name:bar,
value:cd67890
},
{
scope:scope3,
name:baz,
value:[
A,
B,
C
]
}b





特性是一个长度可变的数组,即所有对象都是可选的。这些对象具有任意元素,但它们都包含范围,名称和值。

这是我创建的Hive表:



pre $ CREATE TABLE tbl(
main_id STRING,要素数组< struct< scope:STRING,name:STRING,value:array< STRING> ,age:INT,somelist:array< STRING>>>

I需要一个Hive查询,返回main_id和名称为baz的结构体的值,即

pre $ main $ baz_value
qwert [A,B,C]

我的问题是Hive UDF get_json_object 仅支持有限版本的JSONPath。它不支持像 get_json_object(features,'.features [?(@。name ='baz')]')这样的路径。



如何用Hive查询想要的结果?是否可以使用另一个Hive表结构更简单?

解决方案

我找到了一个解决方案:



使用 Hive爆炸UDTF 来展开struct数组,即为数组features中的每个结构创建第二个(临时)表,并为每个结构创建一个记录。

  CREATE TABLE tbl_exploded as 
选择main_id,
f.name作为f_name,
f.value作为f_value
来自tbl
LATERAL VIEW爆炸(特征)exploded_table作为f
- 可选择在此处进行筛选而不是在第二个查询中:
- 其中f.name ='baz';

结果如下:

 














$ code>

现在您可以选择main_id和像这样的值:

  select main_id,f_value from tbl_exploded where f_name ='baz'; 


I store the following JSON objects in a Hive table:

{
  "main_id": "qwert",
  "features": [
    {
      "scope": "scope1",
      "name": "foo",
      "value": "ab12345",
      "age": 50,
      "somelist": ["abcde","fghij"]
    },
    {
      "scope": "scope2",
      "name": "bar",
      "value": "cd67890"
    },
    {
      "scope": "scope3",
      "name": "baz",
      "value": [
        "A",
        "B",
        "C"
      ]
    }
  ]
}

"features" is an array of varying length, i.e. all objects are optional. The objects have arbitrary elements, but all of them contain "scope", "name" and "value".

This is the Hive table I created:

CREATE TABLE tbl(
main_id STRING,features array<struct<scope:STRING,name:STRING,value:array<STRING>,age:INT,somelist:array<STRING>>>
)

I need a Hive query that returns the main_id and the value of the struct with the name "baz", i.e.,

main_id baz_value
qwert ["A","B","C"]

My problem is that the Hive UDF "get_json_object" supports only a limited version of JSONPath. It does not support a path like get_json_object(features, '$.features[?(@.name='baz')]').

How can query the wanted result with Hive? Is it maybe easier with another Hive table structure?

解决方案

I found a solution for this:

Use the Hive explode UDTF to explode the struct array, i.e., create a second (temporary) table with one record for each struct in the array "features".

CREATE TABLE tbl_exploded as
select main_id, 
f.name as f_name,
f.value as f_value
from tbl
LATERAL VIEW explode(features) exploded_table as f
-- optionally filter here instead of in 2nd query:
-- where f.name = 'baz'; 

The result of this is:

qwert, foo, ab12345
qwert, bar, cd67890
qwert, baz, ["A","B","C"]

Now you can select the main_id and value like this:

select main_id, f_value from tbl_exploded where f_name = 'baz';

这篇关于如何使用Hive查询结构数组(get_json_object)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆