将2个Hive外部表格的数据插入新的外部表格中,并增加一列 [英] Insert data of 2 Hive external tables in new External table with additional column
问题描述
我有2个外部配置单元表,如下所示。我已经使用sqoop从oracle中填充数据。
创建外部表transaction_usa
(
tran_id int,
acct_id int,
tran_date字符串,
数量double,
描述字符串,
branch_code字符串,
tran_state字符串,
tran_city字符串,
speendby字符串,
tran_zip int
)
行格式分隔
存储为文本文件
位置'/ user / stg / bank_stg / tran_usa';
创建外部表transaction_canada
(
tran_id int,
acct_id int,
tran_date字符串,
金额双倍,
说明字符串,
branch_code字符串,
tran_state字符串,
tran_city字符串,
speendby字符串,
tran_zip int
)
行格式分隔
存储为文本文件
位置'/ user / stg / bank_stg / tran_canada';
现在我想合并2个以上的表格数据,因为它在1个外部配置单元表中具有所有相同的字段如上面的两个表格,但是有一个额外的列来标识哪个数据来自哪个表格。新增外部表格,其中列为 source_table
。新外部表如下。
创建外部表transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date字符串,
数量double,
描述字符串,
branch_code字符串,
tran_state字符串,
tran_city字符串,
speendby字符串,
tran_zip int,
source_table字符串
)
分隔行格式
存储为文本文件
位置'/ user / gds / bank_ds / tran_usa_canada ;
我该怎么做。
您从每个表执行 SELECT
,并对这些操作执行 UNION ALL
操作结果并最终将结果插入到第三个表中。
以下是最终的配置单元查询:
INSERT INTO TABLE transaction_usa_canada
SELECT trans_id,acct_id,tran_date,amount,description,branch_code,tran_state,tran_city,speendby,tran_zip,'transaction_usa'AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id,acct_id,tran_date,amount ,description,branch_code,tran_state,tran_city,speendby,tran_zip,'transaction_canada'AS source_table FROM transaction_canada;
希望这可以帮到你!!!
I have 2 external hive tables as follows. I have populated data in them from oracle using sqoop.
create external table transaction_usa
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_usa';
create external table transaction_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_canada';
Now i want to merge above 2 tables data as it is in 1 external hive table with all same fields as in the above 2 tables but with 1 extra column to identify that which data is from which table. The new external table with additional column as source_table
. The new external table is as follows.
create external table transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int,
source_table string
)
row format delimited
stored as textfile
location '/user/gds/bank_ds/tran_usa_canada';
how can I do it.?
You do SELECT
from each table and perform UNION ALL
operation on these results and finally insert the result into your third table.
Below is the final hive query:
INSERT INTO TABLE transaction_usa_canada
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_usa' AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_canada' AS source_table FROM transaction_canada;
Hope this help you!!!
这篇关于将2个Hive外部表格的数据插入新的外部表格中,并增加一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!