配置多个目录时,cassandra如何拆分键空间数据? [英] How does cassandra split keyspace data when multiple directories are configured?

查看:148
本文介绍了配置多个目录时,cassandra如何拆分键空间数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在cassandra.yaml文件中配置了三个单独的数据目录,如下所示:

I have configured three separate data directories in cassandra.yaml file as given below:


data_file_directories:
    - E:/Cassandra/data/var/lib/cassandra/data
    - K:/Cassandra/data/var/lib/cassandra/data

当我创建键空间并插入数据时,在两个目录中都创建了键空间,并且数据分散了。我想知道的是cassandra如何在多个目录之间拆分数据?这背后的规则是什么?

when I create keyspace and insert data my key space got created in both two directories and data got scattered. what I want to know is how cassandra splits the data between multiple directories?. And what is the rule behind this?

推荐答案

在data_file_directories下添加多个条目时,您正在使用Cassandra的JBOD功能。数据按其可用空间成比例地平均分布在已配置的驱动器上。

You are using the JBOD feature of Cassandra when you add multiple entries under data_file_directories. Data is spread evenly over the configured drives proportionate to their available space.

这也使您可以利用disk_failure_policy设置。您可以在此处阅读有关详细信息:
http://www.datastax.com/dev/blog/handling-disk-failures-in-cassandra-1-2

This also let's you take advantage of the disk_failure_policy setting. You can read about the details here: http://www.datastax.com/dev/blog/handling-disk-failures-in-cassandra-1-2

简而言之,您可以将Cassandra配置为继续运行,如果磁盘已满或完全故障,该怎么做。与RAID0(在这种情况下,您实际上具有与JBOD相同的容量)相比,它具有优势,因为您不必从备份(或完全修复)中替换整个数据集,而只需对丢失的数据进行修复。另一方面,RAID0可提供更高的吞吐量(取决于您对如何调整RAID阵列以匹配文件系统和驱动器几何结构的了解程度)。

In short, you can configure Cassandra to keep going, doing what it can if the disk becomes full or fails completely. This has advantages over RAID0 (where you would effectively have the same capacity as JBOD) in that you do not have to replace the whole data set from backup (or full repair) but just run a repair for the missing data. On the other hand, RAID0 provides higher throughput (depending how well you know how to tune RAID arrays to match filesystem and drive geometry).

如果您拥有用于容错/性能更高的RAID设置的资源(例如RAID10),为简单起见,您可能只想使用一个目录即可。虽然大多数部署都开始倾向于使用密度路线,但使用JBOD而不是系统级的容忍度。

If you have the resources for fault-tolerant/more performant RAID setup (like RAID10 for example), you may want to just use a single directory for simplicity. Most deployments are starting to lean towards the density route, using JBOD rather than systems-level tolerance though.

您可以在这里阅读有关此问题发展的思考过程:
https://issues.apache.org/jira/browse/CASSANDRA-4292

You can read about the thought process behind the development of this issue here: https://issues.apache.org/jira/browse/CASSANDRA-4292

这篇关于配置多个目录时,cassandra如何拆分键空间数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆