为什么在 Cassandra 中拥有大分区如此糟糕? [英] Why is it so bad to have large partitions in Cassandra?

查看:19
本文介绍了为什么在 Cassandra 中拥有大分区如此糟糕?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我到处都看到这个警告,但找不到关于这个主题的任何详细解释.

I have seen this warning everywhere but cannot find any detailed explanation on this topic.

推荐答案

初学者

单个分区中的最大单元格数(行 x 列)为20 亿.

The maximum number of cells (rows x columns) in a single partition is 2 billion.

如果你允许一个分区无限增长,你最终会遇到这个限制.

If you allow a partition to grow unbounded you will eventually hit this limitation.

在理论限制之外,还有一些实际限制与大分区对 JVM 和读取时间的影响有关.这些实际限制在版本之间不断增加.这个实际限制不是固定的,而是随数据模型、查询模式、堆大小和配置而变化的,这使得很难直接回答太大的问题.

Outside that theoretical limit, there are practical limitations tied to the impacts large partitions have on the JVM and read times. These practical limitations are constantly increasing from version to version. This practical limitation is not fixed but variable with data model, query patterns, heap size, and configurations which makes it hard to be give a straight answer on whats too large.

从 2.1 和 3.0 早期版本开始,读取和压缩的主要成本来自反序列化索引,该索引每 column_index_size_in_kb 标记一行.您可以增加读取的 key_cache_size_in_mb 以防止不必要的反序列化,但这会减少堆空间并填充旧代.您可以增加列索引大小,但它会增加最坏情况下读取的 IO 成本.在读取这些大分区时,CMS 和 G1 也有许多不同的设置来调整对象分配中巨大峰值的影响.正在积极努力改进这一点,因此将来它可能不再是瓶颈.

As of 2.1 and early 3.0 releases, the primary cost on reads and compactions comes from deserializing the index which marks a row every column_index_size_in_kb. You can increase the key_cache_size_in_mb for reads to prevent unnecessary deserialization but that reduces heap space and fills old gen. You can increase the column index size but it will increase worst case IO costs on reads. Theres also many different settings for CMS and G1 to tune the impact of a huge spike in object allocations when reading these big partitions. There are active efforts on improving this so in the future it might no longer be the bottleneck.

维修也只能降到(在最佳情况下)分区级别.因此,如果说您不断地附加到一个分区,并且在不准确的时间比较 2 个节点上该分区的哈希值(分布式系统本质上保证了这一点),则必须流式传输整个分区以确保一致性.增量修复可以减少这种影响,但您仍在流式传输大量数据和波动很大的磁盘,然后将需要不必要地将它们压缩在一起.

Repairs also only go down to (in best case scenario) the partition level. So if say you are constantly appending to a partition, and a hash of that partition on 2 nodes are compared at not an exact time (distributed system essentially guarantees this), the entire partition must be streamed over to ensure consistency. Incremental repairs can reduce impact of this, but your still streaming massive amounts of data and fluctuating disk significantly which will then need to be compacted together unnecessarily.

您可能可以继续添加有问题的极端案例和场景.很多时候大分区可能可读,但其中涉及的调整和极端情况并不真正值得,最好只是设计数据模型以符合 Cassandra 的预期.我建议以 100mb 为目标,但您可以轻松地超越它.进入 Gbs,您将需要开始考虑对其进行调整(取决于数据模型、用例等).

You can probably keep adding onto this of corner cases and scenarios that have issues. Many times large partitions are possible to read, but the tuning and corner cases involved in them are not really worth it, better to just design data model to be friendly with how Cassandra expects it. I would recommend targeting 100mb but you can go far beyond that comfortably. Into the Gbs and you will need to start consider tuning for it (depending on data model, use case etc).

这篇关于为什么在 Cassandra 中拥有大分区如此糟糕?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆