蜂巢插入中途失败时会发生什么? [英] What happens when a hive insert is failed halfway?

查看:63
本文介绍了蜂巢插入中途失败时会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假定一个插入预期将在蜂巢中加载100条记录,并且已插入40条记录,并且由于某种原因插入失败.事务将完全回滚,并撤消插入的40条记录吗?或即使在插入查询失败后,我们也会在配置单元表中看到40条记录吗?

Suppose an insert is expected to load 100 records in hive and 40 records have been inserted and the insert failed for some reason. will the transaction roll back completely, undoing 40 records which were inserted? or Will we see 40 records in the hive table even after the insert query failed?

推荐答案

该操作是原子操作(即使对于非ACID表也是如此):如果使用HiveQL插入或重写数据,则该操作会将数据写入临时位置,并且仅当命令成功将文件移动到表位置(如果 INSERT OVERWRITE ,旧文件将被删除).如果SQL语句失败,则数据将保持执行前的状态.

The operation is atomic (even for non-ACID table): If you inserting or rewriting data using HiveQL, it writes data into temporary location and only if the command succeeds files are moved to the table location (old files are deleted in case of INSERT OVERWRITE). If SQL statement fails the data remains as it was before statement execution.

有关S3直接写入的注意事项:应该禁止对S3的直接写入功能,以允许Hive仅在操作成功的情况下才能写入临时位置并重写目标文件夹:

Note about S3 direct writes: Direct writes to S3 feature should be disabled to allow Hive to write to temporary location and rewrite target folder only if operation succeeded:

-- Disable AWS S3 direct writes:
set hive.allow.move.on.s3=true; 

还请阅读本文档,以了解有关在并发模式下支持哪些ACID功能和限制的更多详细信息:什么是ACID,为什么要使用它?

Read also this documentation for more details on which ACID features supported in concurrency mode and limitations: What is ACID and why should you use it?

直到Hive 0.13为止,在分区级别都提供了原子性,一致性和持久性.可以通过打开一种可用的锁定机制(ZooKeeper或内存中)来提供隔离.通过在Hive 0.13中添加事务,现在可以在行级别提供完整的ACID语义,以便一个应用程序可以添加行,而另一个应用程序可以从同一分区读取数据而不会互相干扰.

Up until Hive 0.13, atomicity, consistency, and durability were provided at the partition level. Isolation could be provided by turning on one of the available locking mechanisms (ZooKeeper or in memory). With the addition of transactions in Hive 0.13 it is now possible to provide full ACID semantics at the row level, so that one application can add rows while another reads from the same partition without interfering with each other.

也请阅读有关启用了ACID的配置锁(事务和非事务表)

有关S3最终一致性的重要补充. 在S3上,文件在创建后立即保持一致,而在删除或覆盖后最终保持一致.使用基于时间戳的分区文件夹或文件名或GUID前缀的文件名,可以轻松解决一致性问题(极大地降低了最终出现一致性问题的可能性).Qubole提供了其他配置文件以GUID为前缀,这有助于消除最终一致性的问题,因为每次您使用新的GUID前缀编写新文件时,具有不同GUID的文件都会被删除:

Important addition about S3 eventual consistency. On S3 files are immediately consistent after create and eventually consistent after delete or overwrite. You can easily solve the problem with consistency (extremely reduce the probability of eventual consistency issue) using timestamp based partition folders or filenames or GUID prefixed filenames. Qubole provides additional configuration parameters for prefixing files with GUID, this helps to eliminate the issue with eventual consistency because each time you are writing new files with new GUID prefix, files with different GUID are removed:

set hive.qubole.dynpart.use.prefix=true;
set hive.qubole.dynpart.bulk.delete=true;

如果不使用Qubole,则可以创建包含时间戳的位置的分区.如果在Hive中放置分区并使用新的时间戳位置创建新分区,则可以完全消除最终一致性的问题,因为您无需重写文件,位置不同,并且当放置先前的位置时,不管放置何时变得一致,都可以位置不再安装在Hive中.这需要其他分区操作.对于小型表,您可以忽略此问题.另外,每个分区的文件数应保持在较低水平,这将有助于减少数据变得一致的时间.

If you do not use Qubole, you can create partitions with location containing timestamp. If you drop partition in Hive and create new with new timestamp location, you can completely eliminate problem with eventual consistency because you do not rewrite files, location is different, and when you drop previous location, does not matter when drop will become consistent, that location is not mounted in Hive any more. This requires additional partition manipulation. For small tables you can ignore this issue. Also keep the number of files low per partition, this will help to reduce the time when data become consistent.

另请参阅有关S3最终一致性的以下相关答案:

See also these related answers about eventual consistency in S3:

https://stackoverflow.com/a/58706140/2700344

https://stackoverflow.com/a/56192799/2700344

https://stackoverflow.com/a/42677748/2700344

这篇关于蜂巢插入中途失败时会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆