Cassandra 集群中每个节点有多少数据? [英] How much data per node in Cassandra cluster?

查看:28
本文介绍了Cassandra 集群中每个节点有多少数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SSTables 压缩的边界在哪里(主要和次要)以及何时变得无效?

Where are the boundaries of SSTables compaction (major and minor) and when it becomes ineffective?

如果我有 500G SSTable 的主要压缩夫妇,并且我的最终 SSTable 将超过 1TB - 这对一个节点重写"这个大数据集是否有效?

If I have major compaction couple of 500G SSTables and my final SSTable will be over 1TB - will this be effective for one node to "rewrite" this big dataset?

这对于 HDD 来说可能需要大约一天的时间并且需要双倍大小的空间,那么是否有最佳实践?

This can take about day for HDD and need double size space, so are there best practices for this?

推荐答案

1 TB 是单个节点可以处理的数据量的合理限制,但实际上,节点完全不受大小的限制 数据,只有操作率.

1 TB is a reasonable limit on how much data a single node can handle, but in reality, a node is not at all limited by the size of the data, only the rate of operations.

一个节点上可能只有 80 GB 的数据,但如果你绝对随机读取它并且它没有很多 RAM,它甚至可能无法以合理的方式处理该数量的请求速度.类似地,一个节点可能有 10 TB 的数据,但如果您很少读取它,或者您的数据中有一小部分是热的(以便可以有效地缓存),它会做得很好.

A node might have only 80 GB of data on it, but if you absolutely pound it with random reads and it doesn't have a lot of RAM, it might not even be able to handle that number of requests at a reasonable rate. Similarly, a node might have 10 TB of data, but if you rarely read from it, or you have a small portion of your data that is hot (so that it can be effectively cached), it will do just fine.

当您在一个节点上拥有大量数据时,压缩当然是一个需要注意的问题,但需要记住以下几点:

Compaction certainly is an issue to be aware of when you have a large amount of data on one node, but there are a few things to keep in mind:

首先,最大"的压缩,即结果是单个巨大的 SSTable 的压缩很少发生,随着节点上数据量的增加,这种压缩更是如此.(在顶级压缩发生之前必须发生的次要压缩的数量随着您已经执行的顶级压缩的数量呈指数增长.)

First, the "biggest" compactions, ones where the result is a single huge SSTable, happen rarely, even more so as the amount of data on your node increases. (The number of minor compactions that must occur before a top-level compaction occurs grows exponentially by the number of top-level compactions you've already performed.)

其次,您的节点仍然能够处理请求,只是读取速度会变慢.

Second, your node will still be able to handle requests, reads will just be slower.

第三,如果您的复制因子高于 1 并且您没有在一致性级别读取,其他副本将能够快速响应读取请求,因此从客户端的角度来看,您不应该看到延迟的大差异.

Third, if your replication factor is above 1 and you aren't reading at consistency level ALL, other replicas will be able to respond quickly to read requests, so you shouldn't see a large difference in latency from a client perspective.

最后,有改进压缩策略的计划,这可能有助于处理一些更大的数据集.

Last, there are plans to improve the compaction strategy that may help with some larger data sets.

这篇关于Cassandra 集群中每个节点有多少数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆