Hive SELECT 语句来创建一个 STRUCTS 数组 [英] Hive SELECT statement to create an ARRAY of STRUCTS

查看:31
本文介绍了Hive SELECT 语句来创建一个 STRUCTS 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法选择 Hive 中的结构数组.

I'm having trouble selecting into an ARRAY of STRUCTS in Hive.

我的源表如下所示:

+-------------+--+
|    field    |
+-------------+--+
| id          |
| fieldid     |
| fieldlabel  |
| fieldtype   |
| answer_id   |
| unitname    |
+-------------+--+

这是调查数据,其中id是调查id,中间的四个字段是响应数据,unitname是调查所属的业务单位.

This is survey data, where the id is the survey id, the four fields in the middle are response data, and the unitname is the the business unit that the survey pertains to.

我需要为每个调查 ID 的所有答案创建一个结构数组.我认为这会起作用,但它不起作用:

I need to create an array of structs for all of the answers for each survey id. I thought this would work, but it doesn't:

select id, 
array( 
    named_struct(
        "field_id",
        fieldid,
        "field_label",
        fieldlabel,
        "field_type",
        fieldtype,
        "answer_id",
        answer_id,)) as answers,
unitname
from new_answers;

返回的是每个调查答案 (field_id) 作为该答案的一个结构的数组,如下所示:

What that returns is each survey answer (field_id) as an array of one struct for that answer, like this:

id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

但我需要的是:

id | answers | unitname    
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
   {"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
   {"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

我进行了搜索和搜索,但我找到的所有答案似乎都与使用 INSERT INTO....VALUES() 查询有关.我已经有了一个表结构;我只是无法让 ARRAY 以它应有的方式进行 ARRAY.

I've searched and searched, but all the answers I'm finding seem to have to do with using INSERT INTO....VALUES() queries. I have a table structure already; I just can't get the ARRAY to ARRAY up the way it should.

任何帮助将不胜感激.

出于复制目的,如果需要:

For reproduction purposes, if need be:

CREATE TABLE `new_answers`( 
`id` bigint,
`fieldid` bigint,
`fieldlabel` string,
`fieldtype` string,
`answer_id` bigint,
`unitname` string)

INSERT INTO new_answers VALUES
(1,175877,"Comment","COMMENT",8990947803,"Location1"),
(2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
(2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
(2,47829,"Trans #","TEXT",8990950071,"Location2");

推荐答案

您似乎正在寻找的功能是将结构收集到一个数组中.Hive 带有两个用于将事物收集到数组中的函数:collect_set 和 collect_list.但是,这些函数只能用于创建基本类型的数组.

The functionality that you seem to be looking for is to collect the structs into an array. Hive comes with two functions for collecting things into arrays: collect_set and collect_list. However, those functions only work to create arrays of basic types.

brickhouse 项目的 jar (https://github.com/klout/brickhouse/wiki/Downloads) 提供了许多功能,包括收集复杂类型的能力.

The jar for the brickhouse project (https://github.com/klout/brickhouse/wiki/Downloads) provides a number of features, including the ability to collect complex types.

add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar

然后您可以使用您喜欢的任何名称添加 collect 函数:

Then you can add the collect function using whatever name you like:

create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';

以下查询:

select id
     , collect_struct( 
         named_struct(
           "field_id", fieldid,
           "field_label", fieldlabel,
           "field_type", fieldtype,
           "answer_id", answer_id)) as answers
     , unitname
  from new_answers
 group by id, unitname
;

提供以下结果:

id  answers unitname
1   [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
2   [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}]    Location2

这篇关于Hive SELECT 语句来创建一个 STRUCTS 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆