Cassandra群集-数据密度(每个节点的数据大小)-寻找反馈并提供建议 [英] Cassandra cluster - data density (data size per node) - looking for feedback and advises

查看:148
本文介绍了Cassandra群集-数据密度(每个节点的数据大小)-寻找反馈并提供建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑设计Cassandra集群。



用例将为时间序列数据存储大量小样本(使用KairosDB),数据将几乎是一成不变的(很少删除,没有更新)。该部分工作得很好。



但是,几年后,数据将非常大(最大大小可达数百TB-考虑到超过1 PB,



我知道建议不要在每个Cassandra节点上使用超过5TB的数据,因为压缩和修复期间的I / O负载很大(这是对于旋转磁盘而言,显然已经很高)。
因为我们不希望在此用例中构建具有数百个节点的整个数据中心,所以我正在研究在旋转磁盘上具有高密度服务器(例如,每个节点至少使用10TB或20TB的存储空间)是否可行。在RAID10或JBOD中旋转磁盘时,服务器将具有良好的CPU和RAM,因此系统将受I / O约束。)



每秒Cassandra中的读/写量小型集群可以轻松管理。我还可以提到,这不是一个高性能的事务系统,而是一个用于存储,检索和一些分析的数据存储,并且数据几乎是不可变的-因此,即使压缩或修复/重建需要花费数天的时间在几台服务器上,



我想知道是否有人对使用旋转磁盘的高服务器密度以及您使用的配置有经验反馈使用(Cassandra版本,每个节点的数据大小,每个节点的磁盘大小,磁盘配置:JBOD / RAID,硬件类型)。



在此先感谢您的反馈。 / p>

最诚挚的问候。

解决方案

超密集节点的风险是'在修复和压缩期间必然会最大化IO-这是无法可靠地解决整个节点故障的原因。在您对Jim Meyer的答复中,您注意到不鼓励使用RAID5,因为在重建过程中发生故障的可能性很高-相同的潜在故障是反对超密集节点的主要理由。



在vnode之前的日子里,如果您有一个20T节点死亡,并且必须对其进行还原,则必须从相邻节点传输20T(2 -4)个节点,这些节点将使所有这些节点最大,从而增加了它们发生故障的可能性,并且需要花费(小时/天)来恢复关闭的节点。在这段时间里,您的冗余度降低了,如果您珍惜数据,则可能会面临风险。



vnode得到许多人赞赏的原因之一是它在更多邻居之间分配负载-现在,用于引导您的替换节点的流操作来自数十台机器,负载。但是,您仍然有一个基本问题:必须在引导程序失败的情况下将20T的数据存储到节点上。流长期以来一直比期望的要脆弱,并且在云网络上没有故障的情况下流20T的可能性并不大(尽管再次,这种情况越来越好)。



Can你运行20T节点?当然。但是有什么意义呢?为什么不运行5个4T节点-您可以获得更多的冗余,可以相应地缩减CPU /内存,而不必担心立即重新引导20T。



我们的密集节点是具有Cassandra 2.1.x(x> = 7,以避免2.1.5 / 6中的OOM)的4T GP2 EBS卷。我们使用一个单一的卷,因为尽管您建议 cassandra现在已经很好地支持JBOD,但是我们的经验是,依靠Cassandra的平衡算法不太可能给您您所想的东西-IO会在设备之间打雷(不知所措,然后淹没下一个,依此类推),它们会不对称地填充。对我来说,这是反对大量小批量交易的一个很好的理由-我宁愿只在一个批量交易中看到一致的用法。


I am considering the design of a Cassandra cluster.

The use case would be storing large rows of tiny samples for time series data (using KairosDB), data will be almost immutable (very rare delete, no updates). That part is working very well.

However, after several years the data will be quite large (it wil reach a maximum size of several hundreds of terabytes - over one petabyte considering the replication factor).

I am aware of advice not to use more than 5TB of data per Cassandra node because of high I/O loads during compactions and repairs (which is apparently already quite high for spinning disks). Since we don't want to build an entire datacenter with hundreds of nodes for this use case, I am investigating if this would be workable to have high density servers on spinning disks (e.g. at least 10TB or 20TB per node using spinning disks in RAID10 or JBOD, servers would have good CPU and RAM so the system will be I/O bound).

The amount of read/write in Cassandra per second will be manageable by a small cluster without any stress. I can also mention that this is not a high performance transactional system but a datastore for storage, retrievals and some analysis, and data will be almost immutable - so even if a compaction or a repair/reconstruction that take several days of several servers at the same time it's probably not going to be an issue at all.

I am wondering if some people have an experience feedback for high server density using spinning disks and what configuration you are using (Cassandra version, data size per node, disk size per node, disk config: JBOD/RAID, type of hardware).

Thanks in advance for your feedback.

Best regards.

解决方案

The risk of super dense nodes isn't necessarily maxing IO during repair and compaction - it's the inability to reliably resolve a total node failure. In your reply to Jim Meyer, you note that RAID5 is discouraged because the probability of failure during rebuild is too high - that same potential failure is the primary argument against super dense nodes.

In the days pre-vnodes, if you had a 20T node that died, and you had to restore it, you'd have to stream 20T from the neighboring (2-4) nodes, which would max out all of those nodes, increase their likelihood of failure, and it would take (hours/days) to restore the down node. In that time, you're running with reduced redundancy, which is a likely risk if you value your data.

One of the reasons vnodes were appreciated by many people is that it distributes load across more neighbors - now, streaming operations to bootstrap your replacement node come from dozens of machines, spreading the load. However, you still have the fundamental problem: you have to get 20T of data onto the node without bootstrap failing. Streaming has long been more fragile than desired, and the odds of streaming 20T without failure on cloud networks are not fantastic (though again, it's getting better and better).

Can you run 20T nodes? Sure. But what's the point? Why not run 5 4T nodes - you get more redundancy, you can scale down the CPU/memory accordingly, and you don't have to worry about re-bootstrapping 20T all at once.

Our "dense" nodes are 4T GP2 EBS volumes with Cassandra 2.1.x (x >= 7 to avoid the OOMs in 2.1.5/6). We use a single volume, because while you suggest "cassandra now supports JBOD quite well", our experience is that relying on Cassandra's balancing algorithms is unlikely to give you quite what you think it will - IO will thundering herd between devices (overwhelm one, then overwhelm the next, and so on), they'll fill asymmetrically. That, to me, is a great argument against lots of small volumes - I'd rather just see consistent usage on a single volume.

这篇关于Cassandra群集-数据密度(每个节点的数据大小)-寻找反馈并提供建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆