如何将一个分区的数据插入/复制到Hive中的多个分区? [英] How to insert/copy one partition's data to multiple partitions in hive?
问题描述
我的配置单元表中有 day ='2019-01-01'
的数据,我想将相同的数据复制到整个2019年1月。 (即'2019-01-02'
,'2019-01-03'
... '2019-01-31'
)
I'm having data of day='2019-01-01'
in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02'
, '2019-01-03'
...'2019-01-31'
)
我正在尝试关注,但数据仅插入到'2019-01-02 ',而不是在'2019-01-03'中。
I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'.
INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03')
SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01';
推荐答案
将您的所有数据与日历日期进行交叉联接以获取所需日期范围。使用动态分区:
Cross join all your data with calendar dates for required date range. Use dynamic partitioning:
set hivevar:start_date=2019-01-02;
set hivevar:end_date=2019-01-31;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
with date_range as
(--this query generates date range
select date_add ('${hivevar:start_date}',s.i) as dt
from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)
INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, r.dt --partition column is the last one
FROM db_t.students s
CROSS JOIN date_range r
WHERE s.dt='2019-01-01'
DISTRIBUTE BY r.dt;
另一种可能的解决方案是使用 hadoop fs -cp <复制分区数据/ code>或
hadoop distcp
(为每个分区重复或在shell中使用循环):
One more possible solution is to copy partition data using hadoop fs -cp
or hadoop distcp
(repeat for each partition or use loop in the shell ):
hadoop fs -cp '/usr/warehouse/students/dt=2019-01-01' '/usr/warehouse/students/dt=2019-01-02'
还有一个使用UNION ALL的解决方案:
And one more solution using UNION ALL:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, '2019-01-02' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
UNION ALL
SELECT id, name, marks, '2019-01-03' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
UNION ALL
SELECT id, name, marks, '2019-01-04' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
UNION ALL
...
;
这篇关于如何将一个分区的数据插入/复制到Hive中的多个分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!