Hive无法手动设置减速器的数量 [英] Hive unable to manually set number of reducers

查看:112
本文介绍了Hive无法手动设置减速器的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下hive查询:

  select count(distinct id)as mytable from total; 

会自动产生:

1408映射器
1 Reducer

我需要手动设置减速器的数量,并且我尝试了以下方法:

  set mapred.reduce.tasks = 50 
set hive.exec.reducers.max = 50

但这些设置似乎都没有兑现。查询需要永久运行。有没有办法手动设置reducer或可能重写查询,以便可以导致更多的reducer?感谢!

解决方案

在这样的配置单元中编写查询:

  SELECT COUNT(DISTINCT id).... 

只使用一个减速器。
您应该:


  1. 使用此命令设置所需的减速器数量:

    设置mapred.reduce.tasks = 50


  2. 重写查询如下:



lockquote

SELECT COUNT(*)FROM(SELECT DISTINCT id FROM ...)t;


这将导致2个map + reduce工作,而不是1个工作,但是性能提升将会很大。

I have the following hive query:

select count(distinct id) as total from mytable;

which automatically spawns:
1408 Mappers
1 Reducer

I need to manually set the number of reducers and I have tried the following:

set mapred.reduce.tasks=50 
set hive.exec.reducers.max=50

but none of these settings seem to be honored. The query takes forever to run. Is there a way to manually set the reducers or maybe rewrite the query so it can result in more reducers? Thanks!

解决方案

writing query in hive like this:

 SELECT COUNT(DISTINCT id) ....

will always result in using only one reducer. You should:

  1. use this command to set desired number of reducers:

    set mapred.reduce.tasks=50

  2. rewrite query as following:

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;

This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.

这篇关于Hive无法手动设置减速器的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆