配置单元SELECT语句创建STRUCTS的数组 [英] Hive SELECT statement to create an ARRAY of STRUCTS

查看:105
本文介绍了配置单元SELECT语句创建STRUCTS的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法在Hive中选择结构数组.

I'm having trouble selecting into an ARRAY of STRUCTS in Hive.

我的源表如下:

+-------------+--+
|    field    |
+-------------+--+
| id          |
| fieldid     |
| fieldlabel  |
| fieldtype   |
| answer_id   |
| unitname    |
+-------------+--+

这是调查数据,其中id是调查ID,中间的四个字段是响应数据,单位名称是该调查所属的业务单位.

This is survey data, where the id is the survey id, the four fields in the middle are response data, and the unitname is the the business unit that the survey pertains to.

我需要为每个调查ID的所有答案创建一个结构数组.我以为这可以用,但是不能用

I need to create an array of structs for all of the answers for each survey id. I thought this would work, but it doesn't:

select id, 
array( 
    named_struct(
        "field_id",
        fieldid,
        "field_label",
        fieldlabel,
        "field_type",
        fieldtype,
        "answer_id",
        answer_id,)) as answers,
unitname
from new_answers;

返回的结果是每个调查答案(field_id)作为该答案的一个结构的数组,如下所示:

What that returns is each survey answer (field_id) as an array of one struct for that answer, like this:

id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

但是我需要做的是:

id | answers | unitname    
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
   {"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
   {"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

我进行了搜索,但是我发现的所有答案似乎都与使用INSERT INTO .... VALUES()查询有关.我已经有一个表结构了;我只是无法按照应有的方式来安排阵列.

I've searched and searched, but all the answers I'm finding seem to have to do with using INSERT INTO....VALUES() queries. I have a table structure already; I just can't get the ARRAY to ARRAY up the way it should.

任何帮助将不胜感激.

Any help would be most appreciated.

出于复制目的,如果需要的话:

For reproduction purposes, if need be:

CREATE TABLE `new_answers`( 
`id` bigint,
`fieldid` bigint,
`fieldlabel` string,
`fieldtype` string,
`answer_id` bigint,
`unitname` string)

INSERT INTO new_answers VALUES
(1,175877,"Comment","COMMENT",8990947803,"Location1"),
(2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
(2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
(2,47829,"Trans #","TEXT",8990950071,"Location2");

推荐答案

您似乎正在寻找的功能是将结构收集到数组中. Hive带有两个用于将事物收集到数组中的函数:collect_set和collect_list.但是,这些功能只能用于创建基本类型的数组.

The functionality that you seem to be looking for is to collect the structs into an array. Hive comes with two functions for collecting things into arrays: collect_set and collect_list. However, those functions only work to create arrays of basic types.

砖房项目的罐子( https://github.com/klout/brickhouse/wiki/Downloads )提供了许多功能,包括收集复杂类型的功能.

The jar for the brickhouse project (https://github.com/klout/brickhouse/wiki/Downloads) provides a number of features, including the ability to collect complex types.

add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar

然后,您可以使用任何喜欢的名称添加collect函数:

Then you can add the collect function using whatever name you like:

create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';

以下查询:

select id
     , collect_struct( 
         named_struct(
           "field_id", fieldid,
           "field_label", fieldlabel,
           "field_type", fieldtype,
           "answer_id", answer_id)) as answers
     , unitname
  from new_answers
 group by id, unitname
;

提供以下结果:

id  answers unitname
1   [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
2   [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}]    Location2

这篇关于配置单元SELECT语句创建STRUCTS的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆