Hive 中的 COLLECT_SET()，保留重复项? [英] COLLECT_SET() in Hive, keep duplicates?

查看：43 发布时间：2021/12/15 19:03:10 java hadoop user-defined-functions hive

本文介绍了Hive 中的 COLLECT_SET()，保留重复项?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法将重复项保留在 Hive 中的收集集中，或者使用其他方法模拟 Hive 提供的那种聚合集合?我想将列中具有相同键的所有项目聚合到一个数组中，并带有重复项.

Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregate all of the items in a column that have the same key into an array, with duplicates.

即:

hash_id | num_of_cats
=====================
ad3jkfk            4
ad3jkfk            4
ad3jkfk            2
fkjh43f            1
fkjh43f            8
fkjh43f            8
rjkhd93            7
rjkhd93            4
rjkhd93            7

应该返回:

hash_agg | cats_aggregate
===========================
ad3jkfk   Array<int>(4,4,2)
fkjh43f   Array<int>(1,8,8)
rjkhd93   Array<int>(7,4,7)

推荐答案

Hive 0.13.0 后尝试使用 COLLECT_LIST(col)

Try to use COLLECT_LIST(col) after Hive 0.13.0

SELECT
    hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
FROM
    tablename
WHERE
    blablabla
GROUP BY
    hash_id
;

这篇关于Hive 中的 COLLECT_SET()，保留重复项?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive 中的 COLLECT_SET()，保留重复项? [英] COLLECT_SET() in Hive, keep duplicates?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Hive 中的 COLLECT_SET()，保留重复项? [英] COLLECT_SET() in Hive, keep duplicates?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭