运行简单的Hive查询时如何强制执行MR? [英] How to force MR execution when running simple Hive query?

查看：189 发布时间：2021/5/14 19:08:41 hive mapreduce

本文介绍了运行简单的Hive查询时如何强制执行MR?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有Hive 2.1.1 over MR，表 test_table 存储为sequencefile和以下临时查询:

There is Hive 2.1.1 over MR, table test_table stored as sequencefile and the following ad-hoc query:

select t.*
  from test_table t
 where t.test_column = 100

尽管可以在不启动MR(获取任务)的情况下执行此查询，但是有时扫描HDFS文件而不是触发单个映射作业会花费更长的时间.

Although this query can be executed without starting MR (fetch task), sometimes it takes longer to scan HDFS files rather than triggering a single map job.

当我想强制执行MR时，我使查询更加复杂:例如，使用 distinct .这种方法的主要缺点是:

When I want to enforce MR execution, I make the query more complex: e.g., using distinct. The significant drawbacks of this approach are:

查询结果可能与原始查询的结果不同
给群集带来无意义的计算负担

使用Hive-on-MR时是否存在建议的强制执行MR的方法?

Is there a recommended way to force MR execution when using Hive-on-MR?

推荐答案

配置单元执行程序根据以下设置(默认设置)决定执行映射任务还是获取任务:

The hive executor decides either to execute map task or fetch task depending on the following settings (with defaults):

hive.fetch.task.conversion (更多")-将MR任务转换为提取任务的策略
hive.fetch.task.conversion.threshold (1 GB)—可以馈送到提取任务的输入数据的最大大小
hive.fetch.task.aggr (假)—设置为true时，诸如 select src(code)from src 之类的查询也可以在提取任务中执行

hive.fetch.task.conversion ("more") — the strategy for converting MR tasks into fetch tasks
hive.fetch.task.conversion.threshold (1 GB) — max size of input data that can be fed to a fetch task
hive.fetch.task.aggr (false) — when set to true, queries like select count(*) from src also can be executed in a fetch task

它提示我以下两个选项:

It prompts me the following two options:

将 hive.fetch.task.conversion.threshold 设置为较低的值，例如512 Mb
将 hive.fetch.task.conversion 设置为无"

set hive.fetch.task.conversion.threshold to a lower value, e.g. 512 Mb
set hive.fetch.task.conversion to "none"

由于某种原因，降低阈值并没有改变我的情况，所以我站在第二种选择上:对于即席查询似乎很好.

For some reason lowering the threshold did not change anything in my case, so I stood with the second option: seems fine for ad-hoc queries.

有关这些设置的更多详细信息，请参见配置单元Wiki .

More details regarding these settings can be found in Cloudera forum and Hive wiki.

这篇关于运行简单的Hive查询时如何强制执行MR?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

运行简单的Hive查询时如何强制执行MR? [英] How to force MR execution when running simple Hive query?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

运行简单的Hive查询时如何强制执行MR? [英] How to force MR execution when running simple Hive query?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭