Cassandra:如何将整个表移动到另一个键空间 [英] Cassandra: how to move whole tables to another keyspace

查看:71
本文介绍了Cassandra:如何将整个表移动到另一个键空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的Cassandra的版本信息:

Version info of my Cassandra:


[cqlsh 5.0.1 |卡桑德拉2.2.5 | CQL规范3.3.1 |本机协议v4]

[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]

我正在尝试将一些巨大的表(几百万行)移动到另一个键空间。除了从csv复制和从csv复制之外,还有更好的解决方案吗?

I am trying to move some huge tables (several million rows) to another keyspace. Besides "COPY to csv, and COPY from csv", is there any better solution?

推荐答案

好,我设法得到了这个在运行2.2.8的单节点群集上工作。

Ok, I managed to get this to work on a single-node cluster running 2.2.8.

我通过从演示文稿假日表进行了实验>键空间移到我的 stackoverflow 键空间。

这是我采取的步骤:

在新键空间内创建表。

此步骤很重要,因为每个表都有一个UUID唯一标识符,存储在 cf_id 列的 system.schema_columnfamilies 表中。该ID附加到保存数据的目录名称上。通过将架构从一个键空间复制/粘贴到另一个键空间,您将确保使用相同的列名,但是会生成新的唯一标识符。

This step is important, because each table has a UUID for a unique identifier, stored in the system.schema_columnfamilies table in the cf_id column. This id is attached to the directory names that hold the data. By copy/pasting the schema from one keyspace to another, you'll ensure that the same column names are used, but that a new unique identifier is generated.

注意:在3.x中,标识符存储在 system_schema.tables 表中。

Note: In 3.x, the identifier is stored in the system_schema.tables table.

aploetz@cqlsh:stackoverflow> CREATE TABLE holidays (
 type text,
 eventtime timestamp,
 beginend text,
 name text,
 PRIMARY KEY (type, eventtime, beginend)
) WITH CLUSTERING ORDER BY (eventtime DESC, beginend DESC);

aploetz@cqlsh:stackoverflow> SELECT * FROM stackoverflow.holidays ;

 type | eventtime | beginend | name
------+-----------+----------+------

(0 rows)

正确停止节点(DISABLEGOSSIP,DRAIN,kill / stop ,等等...)。

现在,找到磁盘上旧表和新表的位置,然后将文件复制/移动到新位置(从旧位置开始):

$ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
total 12
drwxrwxr-x  3 aploetz aploetz 4096 Nov  8 14:25 .
drwxrwxr-x 17 aploetz aploetz 4096 Nov  8 14:25 ..
drwxrwxr-x  2 aploetz aploetz 4096 Nov  8 14:25 backups
$ cp /var/lib/cassandra/data22/presentation/holidays-74bcfde0139011e6a67c2575e6398503/la* /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/

$ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
drwxrwxr-x  3 aploetz aploetz 4096 Nov  8 14:26 .
drwxrwxr-x 17 aploetz aploetz 4096 Nov  8 14:25 ..
drwxrwxr-x  2 aploetz aploetz 4096 Nov  8 14:25 backups
-rw-rw-r--  1 aploetz aploetz   43 Nov  8 14:26 la-1-big-CompressionInfo.db
-rw-rw-r--  1 aploetz aploetz  628 Nov  8 14:26 la-1-big-Data.db
-rw-rw-r--  1 aploetz aploetz    9 Nov  8 14:26 la-1-big-Digest.adler32
-rw-rw-r--  1 aploetz aploetz   16 Nov  8 14:26 la-1-big-Filter.db
-rw-rw-r--  1 aploetz aploetz   57 Nov  8 14:26 la-1-big-Index.db
-rw-rw-r--  1 aploetz aploetz 4468 Nov  8 14:26 la-1-big-Statistics.db
-rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-1-big-Summary.db
-rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-1-big-TOC.txt
-rw-rw-r--  1 aploetz aploetz   43 Nov  8 14:26 la-2-big-CompressionInfo.db
-rw-rw-r--  1 aploetz aploetz  164 Nov  8 14:26 la-2-big-Data.db
-rw-rw-r--  1 aploetz aploetz   10 Nov  8 14:26 la-2-big-Digest.adler32
-rw-rw-r--  1 aploetz aploetz   16 Nov  8 14:26 la-2-big-Filter.db
-rw-rw-r--  1 aploetz aploetz   26 Nov  8 14:26 la-2-big-Index.db
-rw-rw-r--  1 aploetz aploetz 4460 Nov  8 14:26 la-2-big-Statistics.db
-rw-rw-r--  1 aploetz aploetz  108 Nov  8 14:26 la-2-big-Summary.db
-rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-2-big-TOC.txt

重新启动您的节点。

通过cqlsh查询:

Connected to SnakesAndArrows at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.8 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
aploetz@cqlsh> SELECT * FROM stackoverflow.holidays ;

 type         | eventtime                | beginend | name
--------------+--------------------------+----------+------------------------
    Religious | 2016-12-26 05:59:59+0000 |        E |              Christmas
    Religious | 2016-12-25 06:00:00+0000 |        B |              Christmas
    Religious | 2016-03-28 04:59:59+0000 |        E |                 Easter
    Religious | 2016-03-27 05:00:00+0000 |        B |                 Easter
 presentation | 2016-05-06 20:40:08+0000 |        B |        my presentation
 presentation | 2016-05-06 20:40:03+0000 |        B |        my presentation
 presentation | 2016-05-06 20:39:15+0000 |        B |        my presentation
 presentation | 2016-05-06 20:38:10+0000 |        B |        my presentation
           US | 2016-07-05 04:59:59+0000 |        E |            4th of July
           US | 2016-07-04 05:00:00+0000 |        B |            4th of July
           US | 2016-05-09 04:59:59+0000 |        E |            Mothers Day
           US | 2016-05-08 05:00:00+0000 |        B |            Mothers Day
         Nerd | 2016-12-22 05:59:59+0000 |        E |               2112 Day
         Nerd | 2016-12-21 06:00:00+0000 |        B |               2112 Day
         Nerd | 2016-09-26 04:59:59+0000 |        E |             Hobbit Day
         Nerd | 2016-09-25 05:00:00+0000 |        B |             Hobbit Day
         Nerd | 2016-09-20 04:59:59+0000 |        E | Talk Like a Pirate Day
         Nerd | 2016-09-19 05:00:00+0000 |        B | Talk Like a Pirate Day
         Nerd | 2016-05-07 04:59:59+0000 |        E |         Star Wars Week
         Nerd | 2016-05-04 05:00:00+0000 |        B |         Star Wars Week
         Nerd | 2016-03-14 05:00:00+0000 |        E |                 Pi Day
         Nerd | 2016-03-14 05:00:00+0000 |        B |                 Pi Day

(22 rows)

这种方法的问题是您将需要停止群集,并在每个节点上四处移动文件。而cqlsh COPY 允许您在单个节点上导入和导出,而群集仍在运行。

The problem with this approach, is that you will need to stop the cluster, and move files around on each node. Whereas cqlsh COPY would allow you to import and export on a single node, with the cluster still running.

我知道 COPY 有这样的声誉,将其限制为较小的数据集。但是2.2.x的选项可以帮助限制COPY,以防止在大型数据集上超时。我最近得到了它的3.7亿行的导出/导入,没有超时。

And I know that COPY has this reputation that limits it to smaller datasets. But 2.2.x has options that help throttle COPY to keep it from timing out on large datasets. I recently got it export/import 370 million rows without a timeout.

这篇关于Cassandra:如何将整个表移动到另一个键空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆