Hive与HBase相比如何? [英] How does Hive compare to HBase?

查看:148
本文介绍了Hive与HBase相比如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣了解最近发布的( http:// mirror.facebook.com/hive/hadoop-0.17/ )Hive在性能方面与HBase相比。 Hive使用的类似SQL的接口比我们已经实现的HBase API更适合。

解决方案

很难找到很多关于Hive,但是我在Hive网站上发现了这个片段,它严重偏向于HBase(粗体已添加):

Hive基于Hadoop,它是一个批处理系统。因此,此系统不会不能保证查询的低延迟。这里的范例严格地是提交工作并在工作完成时通知,而不是实时查询。因此,它不应该与像Oracle这样的系统进行比较,在这些系统上,数据量显着减少,但分析的迭代次数更多,迭代之间的响应时间少于几分钟。 对于Hive查询,即使是最小的工作的响应时间也可能为5-10分钟,对于较大的工作,甚至可能需要几个小时。

由于HBase和HyperTable都是关于性能的(模仿Google的BigTable),它们听起来像是肯定会比Hive更快,但是以功能和更高的学习曲线为代价(例如,它们没有连接或类似SQL的语法)。

I'm interested in finding out how the recently-released (http://mirror.facebook.com/facebook/hive/hadoop-0.17/) Hive compares to HBase in terms of performance. The SQL-like interface used by Hive is very much preferable to the HBase API we have implemented.

解决方案

It's hard to find much about Hive, but I found this snippet on the Hive site that leans heavily in favor of HBase (bold added):

Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real time queries. As a result it should not be compared with systems like Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes. For Hive queries response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run into hours.

Since HBase and HyperTable are all about performance (being modeled on Google's BigTable), they sound like they would certainly be much faster than Hive, at the cost of functionality and a higher learning curve (e.g., they don't have joins or the SQL-like syntax).

这篇关于Hive与HBase相比如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆