这决定了地图任务的数量并减少了蜂巢中的任务? [英] which determines the number of map tasks and reduce tasks in hive?

查看:76
本文介绍了这决定了地图任务的数量并减少了蜂巢中的任务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用配置单元来运行查询select * from T1,T2 where T1.a = T2.b,并且模式是T1(int,b int),T2(int,b int),它运行,6个地图任务和一个减少任务生成,我想问,这决定了地图任务的数量和减少任务?数据量是多少?

解决方案

hive> select * from emp; 将没有地图,减少将开始。意味着我们只是在倾销这些数据。



如果我想要这么多地图并减少开始查询的时间。



hive>从emp组中按名称选择count(*);



如果我们添加解释关键字在查询之前会显示多少 map和reduce会启动。
$ b

hive> explain select count(*)从emp group by name;

I use hive to run a query "select * from T1,T2 where T1.a=T2.b", and the schema is T1(a int, b int),T2(a int,b int), when it runs, 6 map tasks and one reduce task generated, and I want to ask that, which determined the number of map tasks and reduce tasks? is the data volume?

解决方案

hive> select * from emp; Then there will be no map and reduce will start. Means we are only dumping the data.

If I want so how many map and reduce start when I am hitting query.

hive> select count(*) from emp group by name;

If we added explain keyword before the query it will going show how many map and reduce will get start.

hive> explain select count(*) from emp group by name;

这篇关于这决定了地图任务的数量并减少了蜂巢中的任务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆