Hive命令行选择查询时间不正确,如果它没有在后台映射减少作业 [英] Hive command line Select query time taken incorrect if its not map reduce job in the background
问题描述
从table_name group by group_name中选择count(*),group_name;
状态:正在运行(在App ID为XXXX的YARN群集上执行)
---------------------------------- ----------------------------------------------
VERTICES状态总计完成运行等待失败杀死
-------------------------------------- ------------------------------------------
地图1 .. ........成功54 54 0 0 0 0
减速器2 ......成功13 13 0 0 0 0
------------ -------------------------------------------------- ------------------
VERTICES:02/02 [====================== ====>>] 100%ELAPSED TIME:24.93 s
------------------------------ --------------------------------------------------
OK
结果
所用时间:26.786秒,提取:10行
当涉及地图缩小时,上述时间看起来准确。但是,当我运行一个简单的查询如下
从table_name中选择group_name
所用时间:0.771秒,提取:14行($)
以上时间不是正确。
另外任何想法如何更准确地测量查询时间将不胜感激。
p>
从shell脚本测量时间。 time
命令
调用你的 hive
命令像这样:
time hive -e'从table_name选择组名;'
时间命令输出三次: real
, user
和 sys
real 0m0.007s
user 0m0.000s
sys 0m0.005s
Real 是什么你可能想知道。 真实是挂钟时间 - 从通话开始到结束的时间。这是所有已用时间,包括其他进程使用的时间片以及进程花费的时间(例如,如果它正在等待I / O完成)。
另请参阅这个问题:如何从'时间'命令获得实时价值?
I am running hive query as below
Select count(*),group_name from table_name group by group_name;
Status: Running (Executing on YARN cluster with App id XXXX)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 54 54 0 0 0 0
Reducer 2 ...... SUCCEEDED 13 13 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 24.93 s
--------------------------------------------------------------------------------
OK
Result
Time taken: 26.786 seconds, Fetched: 10 row(s)
The above timings look accurate when there is map reduce involved. But when I am running a simple query as below
select group_name from table_name
Time taken: 0.771 seconds, Fetched: 14 row(s)
The time above is not correct.
Also any idea how to measure query time more accurately will be greatly appreciated.
Thanks in advance
Measure time from shell script. There is time
command.
Call your hive
command like this:
time hive -e 'select group_name from table_name;'
time command outputs three times: real
, user
and sys
real 0m0.007s
user 0m0.000s
sys 0m0.005s
Real is what you probably want to know. Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
See also this question: How do I get just real time value from 'time' command?
这篇关于Hive命令行选择查询时间不正确,如果它没有在后台映射减少作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!