关于不调用底层MapReduce作业的Hive命令 [英] regarding the Hive commands that do not invoke underlying MapReduce jobs

查看:100
本文介绍了关于不调用底层MapReduce作业的Hive命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的理解是Hive是一种类似SQL的语言,它可以通过调用底层MapReduce程序来执行与数据库相关的任务。但是,我了解到一些Hive命令不会调用MapReduce作业。我很好奇,知道这些命令是什么,以及为什么他们不需要调用MapReduce作业。 你是对的,Hive在后台使用MR作业来处理数据。
Wen你在配置单元中激发了一个类似于SQL的查询,它将它转换成背景中的各种MR作业,并给出结果。



很少有查询不需要MR作业。
for eg



SEKECT * FROM表LIMIT 10;

如果您在上述查询中看到我们不需要任何数据处理。我们需要的只是从表格中读取几行。



因此,上面的蜂巢查询不会触发MR作业



但是,如果我们略微修改上述查询。

SELECT COUNT(*)FROM table;

它会启动MR作业。因为我们需要读取这个查询的所有数据,而且MR作业会尽快为我们做(并行处理)。


My understanding is that Hive is an SQL-like language that can perform database-related tasks by invoking underlying MapReduce programs. However, I learned that some Hive commands does not invoke MapReduce job. I am curious to know that what are these commands, and why they do not need to invoke MapReduce job.

解决方案

You are right, Hive uses MR jobs on the background to process the data. Wen you fire a SQL like query in hive, it converts it into various MR jobs on the background and gives you the result.

Having said that, There are very few queries that doesnt need MR jobs. for e.g

SEKECT * FROM table LIMIT 10;

If you see in the above query we dont need any data processing. All we need is just to read a few rows from a table.

So the above hive query doesnt fire a MR job

But if we slightly modify the above query.

SELECT COUNT(*) FROM table;

It will fire MR jobs. Because we need to read all the data for this query and MR job will do it for us quickly(parallel processing)

这篇关于关于不调用底层MapReduce作业的Hive命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆