使用Java将数据从一个表复制到Cassandra中的另一个表 [英] Copy data from one table to other in Cassandra using Java

查看:28
本文介绍了使用Java将数据从一个表复制到Cassandra中的另一个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将所有数据从一个列族(表)移动到另一个列族(表).由于两个表都有不同的描述,我必须从表 1 中提取所有数据并为表 2 创建一个新对象,然后进行批量 aync 插入.我的表 1 有数百万条记录,所以我无法直接在我的数据结构中获取所有数据并解决它.我正在寻找将 Spring Data Cassandra 与 Java 结合使用的解决方案.

I am trying to move all my data from one column-family (table) to the other. Since both the tables have different descriptions, I would have to pull all data from table-1 and create a new object for table-2 and then do a bulk aync insert. My table-1 has millions of records so I cannot get all the data directly in my data structure and work that out. I am looking out for solutions to do that easily using Spring Data Cassandra with Java.

我最初计划首先将所有数据移动到临时表,然后创建一些复合键关系,然后查询回我的主表.然而,这似乎对我不利.谁能提出一个好的策略来做到这一点?任何线索将不胜感激.谢谢!

I initially planned for moving all the data to a temp table first followed by creating some composite key relations and then querying back my master table. However, it doesn't seems favorable to me. Can anyone suggest a good strategy to do this? Any leads would be appreciated. Thanks!

推荐答案

我的 table-1 有数百万条记录,所以我无法直接在我的数据结构中获取所有数据并解决它.

My table-1 has millions of records so I cannot get all the data directly in my data structure and work that out.

使用 datastax java 驱动程序,您可以通过 令牌范围 并从每个令牌范围计算数据.例如:

With datastax java driver you can get all data by token ranges and work out data from each token range. For example:

Set<TokenRange> tokenRanges = cassandraSession.getCluster().getMetadata().getTokenRanges();

for(TokenRange tr: tokenRanges) {
    List<Row> rows = new ArrayList<>();
    for(TokenRange sub: tr.unwrap()){
        String query = "SELECT * FROM keyspace.table WHERE token(pk) > ? AND token(pk) <= ?";
        SimpleStatement st = new SimpleStatement( query, sub.getStart(), sub.getEnd() );
        rows.addAll( session.execute( st ).all() );
    }
    transformAndWriteToNewTable(rows); 
}

每个令牌范围仅包含所有数据的一部分,并且可以由一台物理机器处理.您可以独立(并行或异步)处理每个令牌范围以获得更高的性能.

Each token range contains only piece of all data and can be handled by one physical machine. You can handle each token range independently (in parallel or asynchronously) to get more performance.

这篇关于使用Java将数据从一个表复制到Cassandra中的另一个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆