使用Hive日期函数而不是硬编码日期字符串时，Hive查询性能很慢？ [英] Hive query performance is slow when using Hive date functions instead of hardcoded date strings?

查看：263 发布时间：2018/5/31 18:54:34 hadoop hive hiveql

本文介绍了使用Hive日期函数而不是硬编码日期字符串时，Hive查询性能很慢？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个每天更新的事务表 table_A 。每天我都会使用file_date 从外部 table_B 插入新数据到 table_A $ c>字段过滤来自外部 table_B 的必要数据以插入到 table_A 中。然而，如果我使用硬编码日期而不是使用Hive日期函数，则会有巨大的性能差异：

   - 快速版〜20分钟）
 SET date_ingest ='2016-12-07'; 
 SET hive.exec.dynamic.partition.mode = nonstrict; 
 SET hive.exec.dynamic.partition = TRUE; 
 
 INSERT 
 INTO 
 TABLE 
 table_A PARTITION（FILE_DATE）SELECT 
 id，eventtime 
，CONCAT_WS（' - '，substr（eventtime ，0，4），SUBSTRING（eventtime，5,2），SUBSTRING（eventtime，7,2））
 FROM 
 table_B 
 WHERE 
 file_date = $ {hiveconf：date_ingest } 
;

相比：

   - 慢版（〜9小时）
 SET date_ingest = date_add（to_date（from_unixtime（unix_timestamp（）））， -  1）; 
 SET hive.exec.dynamic.partition.mode = nonstrict; 
 SET hive.exec.dynamic.partition = TRUE; 
 
 INSERT 
 INTO 
 TABLE 
 table_A PARTITION（FILE_DATE）SELECT 
 id，eventtime 
，CONCAT_WS（' - '，substr（eventtime ，0，4），SUBSTRING（eventtime，5,2），SUBSTRING（eventtime，7,2））
 FROM 
 table_B 
 WHERE 
 file_date = $ {hiveconf：date_ingest } 
;

有没有人遇到类似的问题？由于我们使用的是第三方UI，因此您应该假设我无法访问Unix配置单元命令（即无法使用--hiveconf选项）。

解决方案

有时，在filter子句中使用函数时，分区修剪不起作用。如果你计算wrapper shell脚本中的变量并将它作为-hiveconf变量传递给Hive，它将正常工作。示例： $ p $ #inside shell脚本 date_ingest = $（date -d'-1 day' +％Y-％m-％d） hive -f your_script.hql -hiveconf date_ingest =$ date_ingest

然后在Hive脚本中使用它作为 WHERE file_date ='$ {hiveconf：date_ingest}'

 
I have a transaction table table_A that gets updated every day.  Every day I insert new data into table_A from external table_B using the file_date field to filter the necessary data from external table_B to insert into table_A.  However, there's a huge performance difference if I use a hardcoded date vs. using the Hive date functions:  
-- Fast version (~20 minutes)
SET date_ingest = '2016-12-07';
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.exec.dynamic.partition = TRUE;

INSERT
    INTO
        TABLE
            table_A PARTITION (FILE_DATE) SELECT
                    id, eventtime
                    ,CONCAT_WS( '-' ,substr ( eventtime ,0 ,4 ) ,SUBSTRING( eventtime ,5 ,2 ) ,SUBSTRING( eventtime ,7 ,2 ) )
                FROM
                    table_B
                WHERE
                    file_date = ${hiveconf:date_ingest}
;
compared to:
-- Slow version (~9 hours)
SET date_ingest = date_add(to_date(from_unixtime( unix_timestamp( ) )),-1);
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.exec.dynamic.partition = TRUE;

INSERT
    INTO
        TABLE
            table_A PARTITION (FILE_DATE) SELECT
                    id, eventtime
                    ,CONCAT_WS( '-' ,substr ( eventtime ,0 ,4 ) ,SUBSTRING( eventtime ,5 ,2 ) ,SUBSTRING( eventtime ,7 ,2 ) )
                FROM
                    table_B
                WHERE
                    file_date = ${hiveconf:date_ingest}
;
Has anyone experienced similar issues?  You should assume that I don't have access to the Unix hive command (i.e. can't use --hiveconf options) since we're using a third party UI.
 解决方案 
Sometimes partition pruning does not work when using functions in filter clause. If you calculate the variable in the wrapper shell script and pass it as -hiveconf variable to the Hive, it will work fine.
Example:
#inside shell script
date_ingest=$(date -d '-1 day' +%Y-%m-%d)
hive -f your_script.hql -hiveconf date_ingest="$date_ingest" 
Then use it inside Hive script as  WHERE file_date ='${hiveconf:date_ingest}' 

                        这篇关于使用Hive日期函数而不是硬编码日期字符串时，Hive查询性能很慢？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用Hive日期函数而不是硬编码日期字符串时，Hive查询性能很慢？ [英] Hive query performance is slow when using Hive date functions instead of hardcoded date strings?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

使用Hive日期函数而不是硬编码日期字符串时，Hive查询性能很慢？ [英] Hive query performance is slow when using Hive date functions instead of hardcoded date strings?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭