Cassandra:如何将整个表移动到另一个键空间 [英] Cassandra: how to move whole tables to another keyspace
问题描述
我的Cassandra的版本信息:
Version info of my Cassandra:
[cqlsh 5.0.1 |卡桑德拉2.2.5 | CQL规范3.3.1 |本机协议v4]
[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
我正在尝试将一些巨大的表(几百万行)移动到另一个键空间。除了从csv复制和从csv复制之外,还有更好的解决方案吗?
I am trying to move some huge tables (several million rows) to another keyspace. Besides "COPY to csv, and COPY from csv", is there any better solution?
推荐答案
好,我设法得到了这个在运行2.2.8的单节点群集上工作。
Ok, I managed to get this to work on a single-node cluster running 2.2.8.
我通过从演示文稿
假日表进行了实验>键空间移到我的 stackoverflow
键空间。
这是我采取的步骤:
在新键空间内创建表。
此步骤很重要,因为每个表都有一个UUID唯一标识符,存储在 cf_id
列的 system.schema_columnfamilies
表中。该ID附加到保存数据的目录名称上。通过将架构从一个键空间复制/粘贴到另一个键空间,您将确保使用相同的列名,但是会生成新的唯一标识符。
This step is important, because each table has a UUID for a unique identifier, stored in the system.schema_columnfamilies
table in the cf_id
column. This id is attached to the directory names that hold the data. By copy/pasting the schema from one keyspace to another, you'll ensure that the same column names are used, but that a new unique identifier is generated.
注意:在3.x中,标识符存储在 system_schema.tables
表中。
Note: In 3.x, the identifier is stored in the system_schema.tables
table.
aploetz@cqlsh:stackoverflow> CREATE TABLE holidays (
type text,
eventtime timestamp,
beginend text,
name text,
PRIMARY KEY (type, eventtime, beginend)
) WITH CLUSTERING ORDER BY (eventtime DESC, beginend DESC);
aploetz@cqlsh:stackoverflow> SELECT * FROM stackoverflow.holidays ;
type | eventtime | beginend | name
------+-----------+----------+------
(0 rows)
正确停止节点(DISABLEGOSSIP,DRAIN,kill / stop ,等等...)。
现在,找到磁盘上旧表和新表的位置,然后将文件复制/移动到新位置(从旧位置开始):
$ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
total 12
drwxrwxr-x 3 aploetz aploetz 4096 Nov 8 14:25 .
drwxrwxr-x 17 aploetz aploetz 4096 Nov 8 14:25 ..
drwxrwxr-x 2 aploetz aploetz 4096 Nov 8 14:25 backups
$ cp /var/lib/cassandra/data22/presentation/holidays-74bcfde0139011e6a67c2575e6398503/la* /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
$ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
drwxrwxr-x 3 aploetz aploetz 4096 Nov 8 14:26 .
drwxrwxr-x 17 aploetz aploetz 4096 Nov 8 14:25 ..
drwxrwxr-x 2 aploetz aploetz 4096 Nov 8 14:25 backups
-rw-rw-r-- 1 aploetz aploetz 43 Nov 8 14:26 la-1-big-CompressionInfo.db
-rw-rw-r-- 1 aploetz aploetz 628 Nov 8 14:26 la-1-big-Data.db
-rw-rw-r-- 1 aploetz aploetz 9 Nov 8 14:26 la-1-big-Digest.adler32
-rw-rw-r-- 1 aploetz aploetz 16 Nov 8 14:26 la-1-big-Filter.db
-rw-rw-r-- 1 aploetz aploetz 57 Nov 8 14:26 la-1-big-Index.db
-rw-rw-r-- 1 aploetz aploetz 4468 Nov 8 14:26 la-1-big-Statistics.db
-rw-rw-r-- 1 aploetz aploetz 94 Nov 8 14:26 la-1-big-Summary.db
-rw-rw-r-- 1 aploetz aploetz 94 Nov 8 14:26 la-1-big-TOC.txt
-rw-rw-r-- 1 aploetz aploetz 43 Nov 8 14:26 la-2-big-CompressionInfo.db
-rw-rw-r-- 1 aploetz aploetz 164 Nov 8 14:26 la-2-big-Data.db
-rw-rw-r-- 1 aploetz aploetz 10 Nov 8 14:26 la-2-big-Digest.adler32
-rw-rw-r-- 1 aploetz aploetz 16 Nov 8 14:26 la-2-big-Filter.db
-rw-rw-r-- 1 aploetz aploetz 26 Nov 8 14:26 la-2-big-Index.db
-rw-rw-r-- 1 aploetz aploetz 4460 Nov 8 14:26 la-2-big-Statistics.db
-rw-rw-r-- 1 aploetz aploetz 108 Nov 8 14:26 la-2-big-Summary.db
-rw-rw-r-- 1 aploetz aploetz 94 Nov 8 14:26 la-2-big-TOC.txt
重新启动您的节点。
通过cqlsh查询:
Connected to SnakesAndArrows at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.8 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
aploetz@cqlsh> SELECT * FROM stackoverflow.holidays ;
type | eventtime | beginend | name
--------------+--------------------------+----------+------------------------
Religious | 2016-12-26 05:59:59+0000 | E | Christmas
Religious | 2016-12-25 06:00:00+0000 | B | Christmas
Religious | 2016-03-28 04:59:59+0000 | E | Easter
Religious | 2016-03-27 05:00:00+0000 | B | Easter
presentation | 2016-05-06 20:40:08+0000 | B | my presentation
presentation | 2016-05-06 20:40:03+0000 | B | my presentation
presentation | 2016-05-06 20:39:15+0000 | B | my presentation
presentation | 2016-05-06 20:38:10+0000 | B | my presentation
US | 2016-07-05 04:59:59+0000 | E | 4th of July
US | 2016-07-04 05:00:00+0000 | B | 4th of July
US | 2016-05-09 04:59:59+0000 | E | Mothers Day
US | 2016-05-08 05:00:00+0000 | B | Mothers Day
Nerd | 2016-12-22 05:59:59+0000 | E | 2112 Day
Nerd | 2016-12-21 06:00:00+0000 | B | 2112 Day
Nerd | 2016-09-26 04:59:59+0000 | E | Hobbit Day
Nerd | 2016-09-25 05:00:00+0000 | B | Hobbit Day
Nerd | 2016-09-20 04:59:59+0000 | E | Talk Like a Pirate Day
Nerd | 2016-09-19 05:00:00+0000 | B | Talk Like a Pirate Day
Nerd | 2016-05-07 04:59:59+0000 | E | Star Wars Week
Nerd | 2016-05-04 05:00:00+0000 | B | Star Wars Week
Nerd | 2016-03-14 05:00:00+0000 | E | Pi Day
Nerd | 2016-03-14 05:00:00+0000 | B | Pi Day
(22 rows)
这种方法的问题是您将需要停止群集,并在每个节点上四处移动文件。而cqlsh COPY
允许您在单个节点上导入和导出,而群集仍在运行。
The problem with this approach, is that you will need to stop the cluster, and move files around on each node. Whereas cqlsh COPY
would allow you to import and export on a single node, with the cluster still running.
我知道 COPY
有这样的声誉,将其限制为较小的数据集。但是2.2.x的选项可以帮助限制COPY,以防止在大型数据集上超时。我最近得到了它的3.7亿行的导出/导入,没有超时。
And I know that COPY
has this reputation that limits it to smaller datasets. But 2.2.x has options that help throttle COPY to keep it from timing out on large datasets. I recently got it export/import 370 million rows without a timeout.
这篇关于Cassandra:如何将整个表移动到另一个键空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!