如何删除和更新Hive中的记录 [英] How to delete and update a record in Hive

查看:1512
本文介绍了如何删除和更新Hive中的记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经安装Hadoop,Hive,Hive JD BC。这对我来说运行良好。但我仍然有问题。如何使用Hive删除或更新单个记录,因为MySQL的删除或更新命令在配置单元中不起作用。

I have install Hadoop, Hive, Hive JD BC. which are running fine for me. But I still have a problem. How to delete or update a single record using Hive because delete or update command of MySQL is not working in hive.

谢谢

Thanks

hive> delete from student where id=1;
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null


推荐答案

您应该去想蜂巢作为一个常规的RDBMS,蜂巢是在非常大型成套不可改变的数据更适合于批量处理。

You should not think about Hive as a regular RDBMS, Hive is better suited for batch processing over very large sets of immutable data.

以下内容适用于Hive 0.14之前的版本,请参阅@ashtonium针对更高版本的答案。

The following applies to versions prior to Hive 0.14, see the answer by @ashtonium for later versions.

删除或更新特定操作时不支持任何操作记录或特定的记录集,对我而言,这更像是一种糟糕的模式。

There is no operation supported for deletion or update of a particular record or particular set of records, and to me this is more a sign of a poor schema.

以下是您可以找到的在官方文档中:

Here is what you can find in the official documentation:

Hadoop is a batch processing system and Hadoop jobs tend to have high latency and
incur substantial overheads in job submission and scheduling. As a result -
latency for Hive queries is generally very high (minutes) even when data sets
involved are very small (say a few hundred megabytes). As a result it cannot be
compared with systems such as Oracle where analyses are conducted on a
significantly smaller amount of data but the analyses proceed much more
iteratively with the response times between iterations being less than a few
minutes. Hive aims to provide acceptable (but not optimal) latency for
interactive data browsing, queries over small data sets or test queries.

Hive is not designed for online transaction processing and does not offer
real-time queries and row level updates. It is best used for batch jobs over
large sets of immutable data (like web logs).

解决此限制的一种方法是使用分区:我不知道您 id 对应于,但是如果您分别获得不同批次的ID,则可以重新设计您的表以便通过ID对其进行分区,然后您将可以轻松地删除要分配给您的ID的分区摆脱。

A way to work around this limitation is to use partitions: I don't know what you id corresponds to, but if you're getting different batches of ids separately, you could redesign your table so that it is partitioned by id, and then you would be able to easily drop partitions for the ids you want to get rid of.

这篇关于如何删除和更新Hive中的记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆