大规模数据处理 Hbase vs Cassandra [英] Large scale data processing Hbase vs Cassandra

查看:14
本文介绍了大规模数据处理 Hbase vs Cassandra的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在对大规模数据存储解决方案进行研究后,我差点就加入了 Cassandra.但一般来说,Hbase 是更适合大规模数据处理和分析的解决方案.

I am nearly landed at Cassandra after my research on large scale data storage solutions. But its generally said that Hbase is better solution for large scale data processing and analysis.

虽然两者都是相同的键/值存储,并且都/可以运行(最近 Cassandra)Hadoop 层,但是当需要对大数据进行处理/分析时,是什么让 Hadoop 成为更好的候选者.

While both are same key/value storage and both are/can run (Cassandra recently) Hadoop layer then what makes Hadoop a better candidate when processing/analysis is required on large data.

我还在http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/

但我仍在寻找 Hbase 的具体优势.

but I'm still looking for concrete advantages of Hbase.

虽然我更相信 Cassandra,因为它添加节点和无缝复制的简单性以及无故障点功能.而且它还保留了二级索引功能,所以这是一个很好的加分项.

While I am more convinced about Cassandra because its simplicity for adding nodes and seamless replication and no point of failure features. And it also keeps secondary index feature so its a good plus.

推荐答案

尝试确定哪个最适合您实际上取决于您将使用它的目的,它们各有优势,如果没有更多细节,它会变得更加一场宗教战争.你引用的那篇文章也有一年多的历史了,从那以后都经历了很多变化.另请记住,我对 Cassandra 的最新发展并不熟悉.

Trying to determine which is best for you really depends on what you are going to use it for, they each have their advantages and without any more details it becomes more of a religious war. That post you referenced is also more than a year old and both have gone through many changes since then. Please also keep in mind I am not familiar with the more recent Cassandra developments.

话虽如此,我将转述 HBase 提交者 Andrew Purtell 并添加我自己的一些经验:

Having said that, I'll paraphrase HBase committer Andrew Purtell and add some of my own experiences:

  • HBase 处于更大的生产环境(1000 个节点)中,尽管这仍处于 Cassandra 约 400 个节点安装的范围内,因此它确实是一个微小的差异.

  • HBase is in larger production environments (1000 nodes) although that is still in the ballpark of Cassandra's ~400 node installs so its really a marginal difference.

HBase 和 Cassandra 都支持集群/数据中心之间的复制.我相信 HBase 向用户公开的更多,因此它看起来更复杂,但同时您也获得了更大的灵活性.

HBase and Cassandra both supports replication between clusters/datacenters. I believe HBase's exposes more to the user so it appears more complicated but then you also get more flexibility.

如果您的应用程序需要强一致性,那么 HBase 可能更适合.它从头开始设计以保持一致.例如,它允许更简单地实现原子计数器(我认为 Cassandra 刚刚得到它们)以及 Check 和 Put 操作.

If strong consistency is what your application needs then HBase is likely a better fit. It is designed from the ground up to be consistent. For example it allows for simpler implementation of atomic counters (I think Cassandra just got them) as well as Check and Put operations.

写入性能很棒,据我所知,这是 Facebook 使用 HBase 作为其 Messenger 的原因之一.

Write performance is great, from what I understand that was one of the reasons Facebook went with HBase for their messenger.

我不确定 Cassandra 的有序分区器的当前状态,但过去它需要手动重新平衡.如果您愿意,HBase 会为您处理.有序分区器对于 Hadoop 风格的处理很重要.

I'm not sure of the current state of Cassandra's ordered partitioner, but in the past it required manual rebalancing. HBase handles that for you if you want. The ordered partitioner is important for Hadoop style processing.

Cassandra 和 HBase 都很复杂,Cassandra 只是更好地隐藏了它.HBase 通过使用 HDFS 进行存储来更多地公开它,如果您查看代码库 Cassandra 也是分层的.如果比较 Dynamo 和 Bigtable 的论文,您会发现 Cassandra 的操作理论实际上更复杂.

Cassandra and HBase are both complex, Cassandra just hides it better. HBase exposes it more via using HDFS for its storage, if you look at the codebase Cassandra is just as layered. If you compare the Dynamo and Bigtable papers you can see that Cassandra's theory of operation is actually more complex.

HBase 有更多的单元测试 FWIW.

HBase has more unit tests FWIW.

所有 Cassandra RPC 都是 Thrift,HBase 有一个 Thrift、REST 和原生 Java.Thrift 和 REST 只提供了整个客户端 API 的一个子集,但如果您想要纯粹的速度,本地 Java 客户端就在那里.

All Cassandra RPC is Thrift, HBase has a Thrift, REST and native Java. The Thrift and REST do only offer a subset of the total client API but if you want pure speed the native Java client is there.

点对点和主对从都有优势.主从设置通常更容易调试并降低了相当多的复杂性.

There are advantages to both peer to peer and master to slave. The master - slave setup generally makes it easier to debug and reduces quite a bit of complexity.

HBase 不仅仅与传统 HDFS 绑定,您可以根据需要更改底层存储.MapR 看起来很有趣,虽然我自己没用过,但我听说过好东西.

HBase is not tied to only traditional HDFS, you can change out your underlying storage depending on your needs. MapR looks quite interesting and I have heard good things although I have not used it myself.

这篇关于大规模数据处理 Hbase vs Cassandra的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆