更新,Hive 中的 SET 选项 [英] Update , SET option in Hive

查看:11
本文介绍了更新,Hive 中的 SET 选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 Hadoop 中没有文件更新,但在 Hive 中,可以使用语法糖将新值与表中的旧数据合并,然后用合并的输出重写表,但如果我有新的另一个表中的值,然后我可以通过使用左外连接来达到相同的效果.

I know there is no update of file in Hadoop but in Hive it is possible with syntactic sugar to merge the new values with the old data in the table and then to rewrite the table with the merged output but if I have the new values in another table then I can achieve the same effect by using a left outer join.

就我而言,问题是我必须通过为具有某些 where 条件的列设置一个值来更新表.已知不支持SET.

The problem in my case is that I have to update the table by setting one value to a column with some where condition. It is known that SET is not supported.

例如,考虑下面的正常查询:

For example, consider the below normal query:

UPDATE temp1
SET location=florida
WHERE id=206;

我试图在 Hive 中转换相同的内容,但我卡在了 SET 的位置.如果有人告诉我,那对我来说将是一个很大的帮助.

I tried to convert the same in Hive but I got stuck in the place of SET. If anyone let me know then it would be a great help for me.

推荐答案

INSERT OVERWRITE TABLE _tableName_ PARTITION (_partitionColumn_= _partitionValue_) 
SELECT [other Things], CASE WHEN id=206 THEN 'florida' ELSE location END AS location, [other Other Things] 
FROM _tableName_ WHERE [_whereClause_];

您可以通过逗号分隔列出多个分区.... PARTITION (_partitionColumn_= _partitionValue1_, _partitionColumn_= _partitionValue2_, ...).我没有对多个分区这样做,一次只做一个,所以我会在一次执行所有分区之前检查测试/开发环境上的结果.我还有其他原因将每个 OVERWRITE 限制为单个分区.

You can have multiple partitions listed by separating them by commas. ... PARTITION (_partitionColumn_= _partitionValue1_, _partitionColumn_= _partitionValue2_, ...). I haven't done this with multiple partitions, just one at a time, so I'd check the results on a test/dev env before doing all partitions at once. I had other reasons for limiting each OVERWRITE to a single partition as well.

本页 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML还有一点.
本网站https://cwiki.apache.org/confluence/display/Hive/LanguageManual,一般来说, 是您使用 HiveSQL 时最好的朋友.

This page https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML has a little more on it.
This site https://cwiki.apache.org/confluence/display/Hive/LanguageManual, in general, is your best friend when working with HiveSQL.

我开发了与此相同的东西来迁移一些数据,并且它奏效了.我还没有针对大型数据集尝试过它,只有几 GB 并且效果很好.

I've developed something identical to this to migrate some data and it's worked. I haven't tried it against large datasets, only a few GB and it has worked perfectly.

注意 - 这将覆盖分区.它会让以前的文件再见.创建备份和恢复脚本/程序.[other Things][other Other Things] 是表中的其余列.它们需要按正确的顺序排列.这很重要,否则您的数据将被破坏.

To Note - This will OVERWRITE the partition. It will make previous files go bye-bye. Create backup and restore scripts/procedures. The [other Things] and [other Other Things] are the rest of the columns from the table. They need to be in the correct order. This is very important or else your data will be corrupted.

希望这会有所帮助.:)

Hope this helps. :)

这篇关于更新,Hive 中的 SET 选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆