如何在 Hive 中删除和更新记录 [英] How to delete and update a record in Hive

查看:78
本文介绍了如何在 Hive 中删除和更新记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经安装了 Hadoop、Hive、Hive JDBC.对我来说运行良好.但我仍然有问题.由于 MySQL 的删除或更新命令在 Hive 中不起作用,如何使用 Hive 删除或更新单个记录.

I have installed Hadoop, Hive, Hive JDBC. which are running fine for me. But I still have a problem. How to delete or update a single record using Hive because delete or update command of MySQL is not working in Hive.

谢谢

hive> delete from student where id=1;
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null

推荐答案

您不应将 Hive 视为常规 RDBMS,Hive 更适合对非常大的不可变数据集进行批处理.

You should not think about Hive as a regular RDBMS, Hive is better suited for batch processing over very large sets of immutable data.

以下适用于 Hive 0.14 之前的版本,请参阅 ashtonium 的答案以了解更高版本.

The following applies to versions prior to Hive 0.14, see the answer by ashtonium for later versions.

不支持删除或更新特定记录或特定记录集的操作,对我而言,这更像是架构不佳的迹象.

There is no operation supported for deletion or update of a particular record or particular set of records, and to me this is more a sign of a poor schema.

您可以在官方文档中找到:

Here is what you can find in the official documentation:

Hadoop is a batch processing system and Hadoop jobs tend to have high latency and
incur substantial overheads in job submission and scheduling. As a result -
latency for Hive queries is generally very high (minutes) even when data sets
involved are very small (say a few hundred megabytes). As a result it cannot be
compared with systems such as Oracle where analyses are conducted on a
significantly smaller amount of data but the analyses proceed much more
iteratively with the response times between iterations being less than a few
minutes. Hive aims to provide acceptable (but not optimal) latency for
interactive data browsing, queries over small data sets or test queries.

Hive is not designed for online transaction processing and does not offer
real-time queries and row level updates. It is best used for batch jobs over
large sets of immutable data (like web logs).

解决此限制的一种方法是使用分区:我不知道您的 id 对应什么,但是如果您分别获得不同批次的 id,您可以重新设计您的表以便它按 id 进行分区,然后您就可以轻松地删除要删除的 id 的分区.

A way to work around this limitation is to use partitions: I don't know what you id corresponds to, but if you're getting different batches of ids separately, you could redesign your table so that it is partitioned by id, and then you would be able to easily drop partitions for the ids you want to get rid of.

这篇关于如何在 Hive 中删除和更新记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆