在HIVE中删除一系列分区 [英] Dropping a range of partitions in HIVE

查看:1220
本文介绍了在HIVE中删除一系列分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个按列日期划分的Hive(0.10.1版)表,类型为字符串.我想知道Hive中是否存在一种方法,可以通过该方法删除某个日期范围内的分区(例如从"date1"到"date2").我已经尝试了以下(SQL类型)查询,但是它们在语法上似乎并不正确:

I have a Hive (ver 0.11.0) table partitioned by column date, of type string. I want to know if there exists a way in Hive by which I can drop partitions for a range of dates (say from 'date1' to 'date2'). I have tried the following (SQL type) queries, but they don't seem to be syntactically correct:

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' and date<='date2');

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' && date<='date2');

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date between 'date1' and 'date2');

推荐答案

我认为迄今为止没有任何有效的解决方案.我使用一些Shell脚本(例如:

I don't think there is any valid solution to date. I implemented a workaround for this issue using some shell scripts, like for instance:

for y in {2011..2014} 
do 
  for m in {01..12}
  do 
    echo -n "ALTER TABLE reporting.frontend DROP IF EXISTS PARTITION (year=0000,month=00,day=00,hour=00)" 
    for d in {01..31}
    do 
      for h in {01..23}
      do 
        echo -n ", PARTITION (year=$y,month=$m,day=$d,hour=$h)" 
      done
    done
    echo ";"
  done
done > drop_partitions_v1.hql

使用hive(或beeline)-f选项可以简单地执行生成的.hql文件.

The resulting .hql file can be simply executed by using the hive (or beeline) -f option.

显然,循环应该能够生成要删除的范围,这可能是不平凡的.在最坏的情况下,您将需要使用多个这样的Shell脚本来删除所需的日期范围.

Obviously the loops should be able to generate the range you want to drop, which might be nontrivial. In the worst case you will need to use several such shell scripts in order to drop the desired range of dates.

此外,请注意,在我的情况下,分区具有四个键(年,月,日,小时).如果将日期/分区编码为字符串(我认为这不是一个好主意),则必须在shell脚本中从变量y,m,d和h中构建"目标字符串,并在其中绘制该字符串echo命令.顺便说一下,虚拟分区(仅包含0)就在那儿,以便通过3-4个循环轻松地写入整个"ALTER TABLE"命令,该命令具有特殊的语法.

Further, please note that in my case the partitions had four keys (year, month, day, hour). If your dates/partitions are coded as strings (not a good idea in my opinion), you will have to 'build' your target string out of the variables y, m, d and h in the shell script, and plot the string inside the echo command. By the way, the dummy partition (containing only 0s) is just there in order to write easily by means of 3-4 loops the whole 'ALTER TABLE' command, which has a special syntax.

这篇关于在HIVE中删除一系列分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆