对行进行排序时优化Hive GROUP BY [英] Optimizing Hive GROUP BY when rows are sorted

查看：86 发布时间：2021/5/13 20:16:58 sql hadoop hive query-optimization hiveql

本文介绍了对行进行排序时优化Hive GROUP BY的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下(非常简单的)Hive查询:

I have the following (very simple) Hive query:

select user_id, event_id, min(time) as start, max(time) as end,
       count(*) as total, count(interaction == 1) as clicks
from events_all
group by user_id, event_id;

该表具有以下结构:

user_id                 event_id                time            interaction 
Ex833Lli36nxTvGTA1Dv    juCUv6EnkVundBHSBzQevw  1430481530295   0
Ex833Lli36nxTvGTA1Dv    juCUv6EnkVundBHSBzQevw  1430481530295   1
n0w4uQhOuXymj5jLaCMQ    G+Oj6J9Q1nI1tuosq2ZM/g  1430512179696   0
n0w4uQhOuXymj5jLaCMQ    G+Oj6J9Q1nI1tuosq2ZM/g  1430512217124   0
n0w4uQhOuXymj5jLaCMQ    mqf38Xd6CAQtuvuKc5NlWQ  1430512179696   1

我知道一个事实，即行首先由 user_id 排序，然后再由 event_id 排序.

I know for a fact that rows are sorted first by user_id and then by event_id.

问题是:鉴于行已排序，有没有一种方法可以提示" Hive引擎来优化查询?优化的目的是避免将所有组都保留在内存中，因为这一次仅需保留一个组即可.

The question is: is there a way to "hint" the Hive engine to optimize the query given that rows are sorted? The purpose of optimization is to avoid keeping all groups in memory since its only necessary to keep one group at a time.

现在，此查询在6节点的16 GB Hadoop集群中运行，该集群具有大约300 GB的数据，大约需要30分钟，并且会占用大部分RAM，这会阻塞系统.我知道每个组都很小，每个(user_id，event_id)元组不超过100行，所以我认为优化的执行可能会占用很小的内存，并且运行速度会更快(因为不需要循环使用组密钥.

Right now this query running in a 6-node 16 GB Hadoop cluster with roughly 300 GB of data takes about 30 minutes and uses most of the RAM, choking the system. I know that each group will be small, no more than 100 rows per (user_id, event_id) tuple, so I think an optimized execution will probably have a very small memory footprint and also be faster (since there is no need to loopup group keys).

对行进行排序时优化Hive GROUP BY [英] Optimizing Hive GROUP BY when rows are sorted

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

对行进行排序时优化Hive GROUP BY [英] Optimizing Hive GROUP BY when rows are sorted

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭