在Hive表格中插入覆盖分区 - 值重复 [英] Insert overwrite partition in Hive table - Values getting duplicated
问题描述
我使用非分区表创建了Hive表,并使用select查询将数据插入到分区 Hive表中。
指定网站 p>
- 通过以上链接,我的分区表包含重复值。以下是设置 :
这是我的Sample员工数据集: link1
忘记了上一次INSERT OVERWRITE中的WHERE子句:
INSERT INTO TABLE Unm_Parti_Trail PARTITION(Department ='A')
SELECT employeeid,firstname,designation,CASE WHEN employeeid = 19
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail
WHERE department ='A';
I created a Hive table with Non-partition table and using select query I inserted data into Partitioned Hive table.
- By following above link my partition table contains duplicate values. Below are the setps
This is my Sample employee dataset:link1
I tried the following queries: link2
But after updating a value in Hive table,
Updating salary of Steven with EmployeeID 19 to 50000.
INSERT OVERWRITE TABLE Unm_Parti_Trail PARTITION (Department = 'A') SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail;
the values are getting duplicated.
7 Nirmal Tech 12000 A
7 Nirmal Tech 12000 B
Nirmal is placed in Department A only but it is duplicated to department B.
Am I doing anything wrong?
Please suggest.
It seems like you forgot the WHERE clause in your last INSERT OVERWRITE:
INSERT INTO TABLE Unm_Parti_Trail PARTITION (Department = 'A')
SELECT employeeid,firstname,designation, CASE WHEN employeeid=19
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail
WHERE department = 'A';
这篇关于在Hive表格中插入覆盖分区 - 值重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!