更新Hive中的SET选项 [英] Update , SET option in Hive

查看:346
本文介绍了更新Hive中的SET选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道Hadoop中没有文件的更新,但在Hive中,可以用语法糖将新值与表中的旧数据合并,然后用合并的输出重新编写表,但是如果我有新的值在另一个表中,然后我可以通过使用左外连接达到相同的效果。



在我的情况下,问题是我必须通过设置一个值来更新表到一个其中条件的列。已知不支持 SET

例如,考虑下面的普通查询:

  UPDATE temp1 
SET位置=佛罗里达
WHERE id = 206;

我试图在Hive中转换它,但是我陷入了 SET 。如果有人让我知道,那么这对我很有帮助。

OVERWRITE TABLE _tableName_ PARTITION(_partitionColumn_ = _partitionValue_)
SELECT [other Things],CASE WHEN id = 206 THEN'florida'ELSE location END AS location,[other Other Things]
FROM _tableName_ WHERE [_whereClause_];

您可以使用逗号分隔多个分区。 ... PARTITION(_partitionColumn_ = _partitionValue1_,_partitionColumn_ = _partitionValue2_,...)。我没有用多个分区来做这件事,一次只做一个,所以我会在一次执行所有分区之前在一个测试/ dev env上检查结果。我还有其他的理由将每个 OVERWRITE 限制为一个分区。



这页https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML 有一点更多关于它。

本网站 https://cwiki.apache.org/confluence/display/Hive/一般而言,LanguageManual 是您在使用HiveSQL时最好的朋友。



我已经开发了一些与此相同的东西来迁移某些数据并且工作。我还没有对大数据集进行尝试,只有几GB,并且它的工作完美。



注意 - 这将 OVERWRITE 分区。它会使以前的文件再见。创建备份和恢复脚本/过程。
[其他东西] [其他其他东西] 是表中其余的列。他们需要按照正确的顺序。这非常重要,否则您的数据将被破坏。



希望这有助于您。 :)

I know there is no update of file in Hadoop but in Hive it is possible with syntactic sugar to merge the new values with the old data in the table and then to rewrite the table with the merged output but if I have the new values in another table then I can achieve the same effect by using a left outer join.

The problem in my case is that I have to update the table by setting one value to a column with some where condition. It is known that SET is not supported.

For example, consider the below normal query:

UPDATE temp1
SET location=florida
WHERE id=206;

I tried to convert the same in Hive but I got stuck in the place of SET. If anyone let me know then it would be a great help for me.

解决方案

INSERT OVERWRITE TABLE _tableName_ PARTITION (_partitionColumn_= _partitionValue_) 
SELECT [other Things], CASE WHEN id=206 THEN 'florida' ELSE location END AS location, [other Other Things] 
FROM _tableName_ WHERE [_whereClause_];

You can have multiple partitions listed by separating them by commas. ... PARTITION (_partitionColumn_= _partitionValue1_, _partitionColumn_= _partitionValue2_, ...). I haven't done this with multiple partitions, just one at a time, so I'd check the results on a test/dev env before doing all partitions at once. I had other reasons for limiting each OVERWRITE to a single partition as well.

This page https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML has a little more on it.
This site https://cwiki.apache.org/confluence/display/Hive/LanguageManual, in general, is your best friend when working with HiveSQL.

I've developed something identical to this to migrate some data and it's worked. I haven't tried it against large datasets, only a few GB and it has worked perfectly.

To Note - This will OVERWRITE the partition. It will make previous files go bye-bye. Create backup and restore scripts/procedures. The [other Things] and [other Other Things] are the rest of the columns from the table. They need to be in the correct order. This is very important or else your data will be corrupted.

Hope this helps. :)

这篇关于更新Hive中的SET选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆