蜂巢中的减速器选择 [英] Reducer Selection in Hive

查看:83
本文介绍了蜂巢中的减速器选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下记录要处理

 1000, 1001, 1002 to 1999,
 2000, 2001, 2002 to 2999,
 3000, 3001, 3002 to 3999

我想使用HIVE处理以下记录集,以便reducer-1将处理1000至1999的数据,而reducer-2将处理2000至2999的数据,而reducer-3将处理3000至3999的数据.请帮助我解决以上问题.

And I want to process the following record set using HIVE in such a way so that reducer-1 will process data 1000 to 1999 and reducer-2 will process data 2000 to 2999 and reducer-3 will process data 3000 to 3999.Please help me to solve above problem.

推荐答案

使用DISTRIBUTE BY,映射器的输出将根据distribution by子句进行分组,以传递给reducer进行处理:

Use DISTRIBUTE BY, mappers output is being grouped according to the distribute by clause to be transferred to reducers for processing:

select ...
  from ...
distribute by case when col between 1000 and 1999 then 1
                   when col between 2000 and 2999 then 2
                   when col between 3000 and 3999 then 3
               end

或者简单地

distribute by floor(col/1000)

这篇关于蜂巢中的减速器选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆