Hive如何决定何时使用地图缩小以及何时不使用地图缩小? [英] How does Hive decide when to use map reduce and when not to?
问题描述
select * from tablename;
不会在地图缩小时使用,而
从tablename中选择count(*);
是的。什么是用于决定何时使用map reduce(通过配置单元)的一般原则?
一般来说,任何类型的聚合,例如min / max / count将需要MapReduce作业。这可能不会解释你的一切。
Hive以许多RDBMS的风格有一个 EXPLAIN
关键字,它将概述您的Hive查询如何转换为MapReduce作业。尝试在你的两个示例查询上运行解释并查看它在后台尝试执行的操作。
As a simple example,
select * from tablename;
DOES NOT kick in map reduce, while
select count(*) from tablename;
DOES. What is the general principle used to decide when to use map reduce (by hive)?
In general, any sort of aggregation, such as min/max/count is going to require a MapReduce job. This isn't going to explain everything for you, probably.
Hive, in the style of many RDBMS, has an EXPLAIN
keyword that will outline how your Hive query gets translated into MapReduce jobs. Try running explain on both your example queries and see what it is trying to do behind the scenes.
这篇关于Hive如何决定何时使用地图缩小以及何时不使用地图缩小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!