Cassandra集群中每个节点有多少数据? [英] How much data per node in Cassandra cluster?

查看:253
本文介绍了Cassandra集群中每个节点有多少数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SSTables压缩(主要和次要)的边界在何时变得无效?

Where are the boundaries of SSTables compaction (major and minor) and when it becomes ineffective?

如果我有500G SSTables的主要压缩对,我的最后一个SSTable将超过1TB - 这将有效的一个节点重写这个大数据集?

If I have major compaction couple of 500G SSTables and my final SSTable will be over 1TB - will this be effective for one node to "rewrite" this big dataset?

这可能需要大约一天的硬盘,需要双倍大小的空间,所以有最佳做法吗?

This can take about day for HDD and need double size space, so are there best practices for this?

推荐答案

1 TB是单个节点可以处理多少数据的合理限制,但实际上,节点不受任何大小的限制数据,只有操作速率

1 TB is a reasonable limit on how much data a single node can handle, but in reality, a node is not at all limited by the size of the data, only the rate of operations.

节点可能只有80 GB的数据,但如果你用随机数读取,它没有很多RAM,它可能甚至不能以合理的速度处理这个数量的请求。同样,一个节点可能有10 TB的数据,但如果你很少从中读取数据,或者你的一小部分数据是热的(以便它可以有效地缓存),它会做得很好。

A node might have only 80 GB of data on it, but if you absolutely pound it with random reads and it doesn't have a lot of RAM, it might not even be able to handle that number of requests at a reasonable rate. Similarly, a node might have 10 TB of data, but if you rarely read from it, or you have a small portion of your data that is hot (so that it can be effectively cached), it will do just fine.

Compaction当然是一个问题,当你有一个节点上有大量的数据,但有几个要记住的事情:

Compaction certainly is an issue to be aware of when you have a large amount of data on one node, but there are a few things to keep in mind:

首先,最大的压缩,其中结果是单个巨大的SSTable,很少发生,更多地,因为您的节点上的数据量增加。 (在执行顶级压缩之前必须发生的次要压缩的数量将按您已执行的顶级压缩的数量呈指数增长。)

First, the "biggest" compactions, ones where the result is a single huge SSTable, happen rarely, even more so as the amount of data on your node increases. (The number of minor compactions that must occur before a top-level compaction occurs grows exponentially by the number of top-level compactions you've already performed.)

其次,你的节点仍然能够处理请求,读取只会更慢。

Second, your node will still be able to handle requests, reads will just be slower.

第三,如果你的复制因子高于1,并且你没有在一致性级别读取ALL ,其他副本将能够快速响应读取请求,因此您不应该看到从客户端角度来看延迟的很大差异。

Third, if your replication factor is above 1 and you aren't reading at consistency level ALL, other replicas will be able to respond quickly to read requests, so you shouldn't see a large difference in latency from a client perspective.

最后,有计划改进压缩策略,这可能有助于一些更大的数据集。

Last, there are plans to improve the compaction strategy that may help with some larger data sets.

这篇关于Cassandra集群中每个节点有多少数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆