如何将数据从大小为N的Cassandra集群迁移到大小为N +/- M的其他集群 [英] How to migrate data from Cassandra cluster of size N to a different cluster of size N+/-M
问题描述
我正在尝试找出如何将数据从一个cassandra群集迁移到另一个具有不同环大小的cassandra群集的方法,例如从5节点群集迁移到7节点群集.
I'm trying to figure out how to migrate data from one cassandra cluster, to another cassandra cluster of a different ring size...say from a 5 node cluster to a 7 node cluster.
我开始研究sstable2json,因为它在该特定的cassandra节点上为SSTable创建了一个json文件.我的想法是对环中每个节点上的列族执行此操作.因此,在5个节点的环上,这将给我5个json文件,其中一个文件用于存储驻留在每个节点上的列族中的数据.
I started looking at sstable2json, since it creates a json file for the SSTable on that specific cassandra node. My thought was to do this for a column family on each node in the ring. So on a 5 node ring, this would give me 5 json files, one file for the data stored on in the column family that resides on each node.
然后,我将json文件合并到一个文件中,并使用json2sstable导入大小为7的新集群.环,但我刚刚读到,一旦写入SSTables就是不可变的.因此,如果我做了我刚才提到的事情,最终我将把列族中的所有数据都存储在一个节点上.
Then I'd merge the json files into one file, and use json2sstable to import into a new cluster, of size, lets say 7. I was hoping that cassandra would then replicate/balance the data out evenly across the nodes in the ring, but I just read that SSTables are immutable once written. So if I did what I just mentioned, I'd end up with a ring with all the data in my column family on one node.
那么有人可以帮助我弄清楚将数据从一个群集迁移到具有不同环大小的另一个群集的过程吗?
So can anyone help me figure out the process for migrating data from one cluster to a different cluster of a different ring size?
推荐答案
更好:在旧环上的sstables上使用bin/sstabletableloader,流式传输到新环.
Better: use bin/sstableloader on the sstables from the old ring, to stream to the new one.
通常,sstableloader的使用顺序如下:
Normally sstableloader is used in a sequence like this:
- 使用SSTableWriter在本地创建sstables
- 使用sstableloader将sstable中的数据流式传输到正确的节点(bin/sstableloader路径到目录-full-of-sstables).目录名称假定为键空间,如果将其指向现有的Cassandra数据目录,则为键空间.
由于您希望将数据从现有集群A流传输到新的集群B,因此可以直接针对集群A中每个节点上的数据跳过运行sstableloader的操作.
Since you're looking to stream data from an existing cluster A to a new cluter B, you can skip straight to running sstableloader against the data on each node in cluster A.
此博客文章中有关使用sstableloader的更多详细信息.
More details on using sstableloader in this blog post.
这篇关于如何将数据从大小为N的Cassandra集群迁移到大小为N +/- M的其他集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!