是否可以在Hive中通过组合后连接字符串字段 [英] Is it possible to concat a string field after group by in Hive

查看:1747
本文介绍了是否可以在Hive中通过组合后连接字符串字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在评估Hive,并且需要在group by之后执行一些字符串字段连接。我找到了一个名为concat_ws的函数,但它看起来像我必须明确列出所有要连接的值。我想知道如果我可以在Hive中使用concat_ws做这样的事情。这是一个例子。所以我有一个名为my_table的表,它有两个名为country和city的字段。我想每个国家只有一个记录,每个记录将有两个字段 - 国家和城市:

  select country,concat_ws(城市,|)作为城市
from my_table
按国家/地区

这在Hive中可能吗?我现在使用CDH5中的Hive 0.11 在数据库管理中,聚合函数是一个函数其中多行的值按照某些标准被分组在一起作为输入,以形成更重要含义或度量的单个值,例如集合,包或列表。

资料来源:综合功能 - 维基百科



Hive开箱即用的聚合函数列在以下网页上:

因此,唯一的内置选项(对于Hive 0.11;对于Hive 0.13及以上版本,您有 collect_list )是:

array collect_set(col)



重复 city 记录每个 country (返回一组消除重复元素的对象)。否则在Hive之外创建自己的UDAF或聚合。



写作UDAF的参考:


I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities:

select country, concat_ws(city, "|") as cities
from my_table
group by country

Is this possible in Hive? I am using Hive 0.11 from CDH5 right now

解决方案

In database management an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list.

Source: Aggregate function - Wikipedia

Hive's out-of-the-box aggregate functions listed on the following web-page:
Built-in Aggregate Functions (UDAF - user defined aggregation function)

So, the only built-in option (for Hive 0.11; for Hive 0.13 and above you have collect_list) is:
array collect_set(col)

This one will answer your request in case there is no duplicate city records per country (returns a set of objects with duplicate elements eliminated). Otherwise create your own UDAF or aggregate outside of Hive.

References for writing UDAF:

这篇关于是否可以在Hive中通过组合后连接字符串字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆