Hive 如何决定何时使用 map reduce 以及何时不使用? [英] How does Hive decide when to use map reduce and when not to?
问题描述
举个简单的例子
select * from tablename;
不启动 map reduce,而
DOES NOT kick in map reduce, while
select count(*) from tablename;
确实如此.决定何时使用 map reduce (by hive) 的一般原则是什么?
DOES. What is the general principle used to decide when to use map reduce (by hive)?
推荐答案
一般来说,任何类型的聚合,例如 min/max/count 都需要 MapReduce 作业.这可能不会为您解释一切.
In general, any sort of aggregation, such as min/max/count is going to require a MapReduce job. This isn't going to explain everything for you, probably.
Hive,在许多 RDBMS 的风格中,有一个 EXPLAIN
关键字,将概述您的 Hive 查询如何转换为 MapReduce 作业.尝试对您的两个示例查询运行解释,看看它在幕后尝试做什么.
Hive, in the style of many RDBMS, has an EXPLAIN
keyword that will outline how your Hive query gets translated into MapReduce jobs. Try running explain on both your example queries and see what it is trying to do behind the scenes.
这篇关于Hive 如何决定何时使用 map reduce 以及何时不使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!