使用Java在Cassandra中将数据从一个表复制到另一个表 [英] Copy data from one table to other in Cassandra using Java

查看:288
本文介绍了使用Java在Cassandra中将数据从一个表复制到另一个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将所有数据从一个列族(表)移至另一列族。由于两个表都有不同的描述,因此我将不得不从表1中提取所有数据并为表2创建一个新对象,然后进行批量aync插入。我的表1有数百万条记录,因此我无法直接在我的数据结构中获取所有数据并进行计算。我正在寻找使用Spring Data Cassandra和Java轻松实现此目的的解决方案。

I am trying to move all my data from one column-family (table) to the other. Since both the tables have different descriptions, I would have to pull all data from table-1 and create a new object for table-2 and then do a bulk aync insert. My table-1 has millions of records so I cannot get all the data directly in my data structure and work that out. I am looking out for solutions to do that easily using Spring Data Cassandra with Java.

我最初计划首先将所有数据移动到临时表中,然后创建一些组合关键关系,然后查询我的主表。但是,这似乎对我不利。谁能建议一个好的策略来做到这一点?任何线索将不胜感激。谢谢!

I initially planned for moving all the data to a temp table first followed by creating some composite key relations and then querying back my master table. However, it doesn't seems favorable to me. Can anyone suggest a good strategy to do this? Any leads would be appreciated. Thanks!

推荐答案


我的table-1有数百万条记录,所以我无法直接获取所有数据数据结构并解决。

My table-1 has millions of records so I cannot get all the data directly in my data structure and work that out.

使用datastax Java驱动程序,您可以通过令牌范围,并从每个令牌范围算出数据。例如:

With datastax java driver you can get all data by token ranges and work out data from each token range. For example:

Set<TokenRange> tokenRanges = cassandraSession.getCluster().getMetadata().getTokenRanges();

for(TokenRange tr: tokenRanges) {
    List<Row> rows = new ArrayList<>();
    for(TokenRange sub: tr.unwrap()){
        String query = "SELECT * FROM keyspace.table WHERE token(pk) > ? AND token(pk) <= ?";
        SimpleStatement st = new SimpleStatement( query, sub.getStart(), sub.getEnd() );
        rows.addAll( session.execute( st ).all() );
    }
    transformAndWriteToNewTable(rows); 
}

每个令牌范围仅包含所有数据中的一部分,并且可以由一个实体处理机。您可以独立(并行或异步)处理每个令牌范围,以提高性能。

Each token range contains only piece of all data and can be handled by one physical machine. You can handle each token range independently (in parallel or asynchronously) to get more performance.

这篇关于使用Java在Cassandra中将数据从一个表复制到另一个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆