Hive中的COLLECT_SET（），保持重复吗？ [英] COLLECT_SET() in Hive, keep duplicates?

查看：402 发布时间：2018/5/31 18:32:40 java hadoop user-defined-functions hive

本文介绍了Hive中的COLLECT_SET（），保持重复吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法将重复集合保存在Hive中的收集集合中，或者模拟Hive使用其他方法提供的聚合集合类型？我想将具有相同关键字的列中的所有项目汇总到一个数组中，并重复。

IE：

  hash_id | num_of_cats 
 ===================== 
 ad3jkfk 4 
 ad3jkfk 4 
 ad3jkfk 2 
 fkjh43f 1 
 fkjh43f 8 
 fkjh43f 8 
 rjkhd93 7 
 rjkhd93 4 
 rjkhd93 7

应该返回：

  hash_agg | cats_aggregate 
 =========================== 
 ad3jkfk数组< int>（4,4,2）
 fkjh43f数组< int>（1,8,8）
 rjkhd93数组< int>（7,4,7）

解决方案

尝试在Hive 0.13.0之后使用COLLECT_LIST（col）

  SELECT 
 hash_id，COLLECT_LIST（num_of_cats）AS aggr_set 
 FROM 
表名
 WHERE 
 blablabla 
 GROUP BY 
 hash_id 
;

Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregate all of the items in a column that have the same key into an array, with duplicates.

I.E.:
hash_id | num_of_cats ===================== ad3jkfk 4 ad3jkfk 4 ad3jkfk 2 fkjh43f 1 fkjh43f 8 fkjh43f 8 rjkhd93 7 rjkhd93 4 rjkhd93 7
should return:
hash_agg | cats_aggregate =========================== ad3jkfk Array<int>(4,4,2) fkjh43f Array<int>(1,8,8) rjkhd93 Array<int>(7,4,7)

解决方案
Try to use COLLECT_LIST(col) after Hive 0.13.0
SELECT hash_id, COLLECT_LIST(num_of_cats) AS aggr_set FROM tablename WHERE blablabla GROUP BY hash_id ;

这篇关于Hive中的COLLECT_SET（），保持重复吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive中的COLLECT_SET（），保持重复吗？ [英] COLLECT_SET() in Hive, keep duplicates?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Hive中的COLLECT_SET（），保持重复吗？ [英] COLLECT_SET() in Hive, keep duplicates?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭