为什么Hive中的Fetch任务比仅Map任务更快地工作? [英] Why is Fetch task in Hive works faster than Map-only task?

查看:173
本文介绍了为什么Hive中的Fetch任务比仅Map任务更快地工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以使用hive hive.fetch.task.conversion参数在Hive中启用Fetch任务以进行简单查询,而不是Map或MapReduce.

It is possible to enable Fetch task in Hive for simple query instead of Map or MapReduce using hive hive.fetch.task.conversion parameter.

请解释为什么提取任务比Map运行得快得多,尤其是在做一些简单的工作时(例如select * from table limit 10;)?在这种情况下,还会执行哪些仅地图任务?就我而言,性能差异要快20倍以上.这两个任务都应该读取表数据,不是吗?

Please explain why Fetch task is running much faster than Map especially when doing some simple work (for example select * from table limit 10;)? What map-only task is doing additionally in this case? The performance difference is more than 20 times faster in my case. Both tasks should read the table data, isn't it?

推荐答案

FetchTask直接获取数据,而Mapreduce将调用地图归约作业

FetchTask directly fetches data, whereas Mapreduce will invoke a map reduce job

<property>
  <name>hive.fetch.task.conversion</name>
  <value>minimal</value>
  <description>
    Some select queries can be converted to single FETCH task 
    minimizing latency.Currently the query should be single 
    sourced not having any subquery and should not have
    any aggregations or distincts (which incurrs RS), 
    lateral views and joins.
    1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
    2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
  </description>
</property>

还有另一个参数hive.fetch.task.conversion.threshold,默认情况下在0.10-0.13中为-1,而在0.14中则为1G(1073741824) 这表明,如果表大小大于1G,请使用Mapreduce而不是Fetch任务

Also there is another parameter hive.fetch.task.conversion.threshold which by default in 0.10-0.13 is -1 and >0.14 is 1G(1073741824) This indicates that, If table size is greater than 1G use Mapreduce instead of Fetch task

更多详细信息

这篇关于为什么Hive中的Fetch任务比仅Map任务更快地工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆