Hive 无法手动设置减速器的数量 [英] Hive unable to manually set number of reducers

查看:25
本文介绍了Hive 无法手动设置减速器的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下配置单元查询:

select count(distinct id) as total from mytable;

自动生成:
1408 映射器
1 减速机

我需要手动设置减速器的数量,我尝试了以下方法:

set mapred.reduce.tasks=50设置 hive.exec.reducers.max=50

但这些设置似乎都没有得到尊重.查询需要永远运行.有没有办法手动设置减速器或重写查询以便它可以产生更多减速器?谢谢!

解决方案

像这样在 hive 中编写查询:

 SELECT COUNT(DISTINCT id) ....

总是会导致只使用一个减速器.你应该:

  1. 使用此命令设置所需的减速器数量:

    设置 mapred.reduce.tasks=50

  2. 重写查询如下:

<块引用>

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;

这将导致 2 个 map+reduce 作业而不是 1 个,但性能提升将是可观的.

I have the following hive query:

select count(distinct id) as total from mytable;

which automatically spawns:
1408 Mappers
1 Reducer

I need to manually set the number of reducers and I have tried the following:

set mapred.reduce.tasks=50 
set hive.exec.reducers.max=50

but none of these settings seem to be honored. The query takes forever to run. Is there a way to manually set the reducers or maybe rewrite the query so it can result in more reducers? Thanks!

解决方案

writing query in hive like this:

 SELECT COUNT(DISTINCT id) ....

will always result in using only one reducer. You should:

  1. use this command to set desired number of reducers:

    set mapred.reduce.tasks=50

  2. rewrite query as following:

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;

This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.

这篇关于Hive 无法手动设置减速器的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆