如何将一个分区的数据插入/复制到配置单元中的多个分区? [英] How to insert/copy one partition's data to multiple partitions in hive?

查看:32
本文介绍了如何将一个分区的数据插入/复制到配置单元中的多个分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 hive 表中有 day='2019-01-01' 的数据,我想将相同的数据复制到 2019 年 1 月的整个月份.(即在'2019-01-02','2019-01-03'...'2019-01-31')

I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02', '2019-01-03'...'2019-01-31')

我正在尝试关注,但数据仅插入到2019-01-02"而不是2019-01-03"中.

I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'.

INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03')
SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01';

推荐答案

将所有数据与所需日期范围的日历日期交叉连接.使用动态分区:

Cross join all your data with calendar dates for required date range. Use dynamic partitioning:

set hivevar:start_date=2019-01-02; 
set hivevar:end_date=2019-01-31; 

set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;  

with date_range as 
(--this query generates date range
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)

INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, r.dt --partition column is the last one
  FROM db_t.students s 
       CROSS JOIN date_range r
 WHERE s.dt='2019-01-01'
DISTRIBUTE BY r.dt;

另一种可能的解决方案是使用 hadoop fs -cphadoop distcp 复制分区数据(对每个分区重复或在 shell 中使用循环):

One more possible solution is to copy partition data using hadoop fs -cp or hadoop distcp (repeat for each partition or use loop in the shell ):

hadoop fs -cp '/usr/warehouse/students/dt=2019-01-01' '/usr/warehouse/students/dt=2019-01-02'

还有一种使用 UNION ALL 的解决方案:

And one more solution using UNION ALL:

    set hive.exec.dynamic.partition=true; 
    set hive.exec.dynamic.partition.mode=nonstrict;      

    INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
    SELECT id, name, marks, '2019-01-02' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-03' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-04' as dt FROM db_t.students s WHERE s.dt='2019-01-01' 
    UNION ALL
    ... 
  ;

这篇关于如何将一个分区的数据插入/复制到配置单元中的多个分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆