Hive命令行选择查询时间不正确,如果它没有在后台映射减少作业 [英] Hive command line Select query time taken incorrect if its not map reduce job in the background

查看:94
本文介绍了Hive命令行选择查询时间不正确,如果它没有在后台映射减少作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 从table_name group by group_name中选择count(*),group_name; 

状态:正在运行(在App ID为XXXX的YARN群集上执行)

  ---------------------------------- ---------------------------------------------- 
VERTICES状态总计完成运行等待失败杀死
-------------------------------------- ------------------------------------------
地图1 .. ........成功54 54 0 0 0 0
减速器2 ......成功13 13 0 0 0 0
------------ -------------------------------------------------- ------------------
VERTICES:02/02 [====================== ====>>] 100%ELAPSED TIME:24.93 s
------------------------------ --------------------------------------------------
OK
结果
所用时间:26.786秒,提取:10行

当涉及地图缩小时,上述时间看起来准确。但是,当我运行一个简单的查询如下

 从table_name中选择group_name 

所用时间:0.771秒,提取:14行($)

以上时间不是正确。



另外任何想法如何更准确地测量查询时间将不胜感激。



p>

解决方案

从shell脚本测量时间。 time 命令



调用你的 hive 命令像这样:

  time hive -e'从table_name选择组名;'

时间命令输出三次: real user sys

  real 0m0.007s 
user 0m0.000s
sys 0m0.005s

Real 是什么你可能想知道。 真实是挂钟时间 - 从通话开始到结束的时间。这是所有已用时间,包括其他进程使用的时间片以及进程花费的时间(例如,如果它正在等待I / O完成)。

另请参阅这个问题:如何从'时间'命令获得实时价值?


I am running hive query as below

Select count(*),group_name from table_name group by group_name;

Status: Running (Executing on YARN cluster with App id XXXX)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED     54         54        0        0       0       0
Reducer 2 ......   SUCCEEDED     13         13        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 24.93 s
--------------------------------------------------------------------------------
OK
Result
Time taken: 26.786 seconds, Fetched: 10 row(s)

The above timings look accurate when there is map reduce involved. But when I am running a simple query as below

select group_name from table_name

Time taken: 0.771 seconds, Fetched: 14 row(s)

The time above is not correct.

Also any idea how to measure query time more accurately will be greatly appreciated.

Thanks in advance

解决方案

Measure time from shell script. There is time command.

Call your hive command like this:

time hive -e 'select group_name from table_name;'

time command outputs three times: real, user and sys

real        0m0.007s
user        0m0.000s
sys         0m0.005s 

Real is what you probably want to know. Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).

See also this question: How do I get just real time value from 'time' command?

这篇关于Hive命令行选择查询时间不正确,如果它没有在后台映射减少作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆