通过java中的主键列表进行cassandra查找 [英] cassandra lookup by list of primary keys in java
问题描述
我正在实现一项功能,该功能需要通过主键列表查找 Cassandra.
下面是一个示例数据,其中 id 是主键
mytableid 列 11 4232 5423 6784 455345 4356346 24357 6788 45649 546
我的大多数查询都是按 id 查找,但对于某些特殊情况,我想获取 id 列表的数据.我目前的做法如下:
<代码>公共对象 fetchFromCassandraForId(int id);int ids[] = {1, 3, 5, 7, 9};列表<对象>结果;for(int id: ids) {results.add(fetchFromCassandraForId(id));}
这导致向 cassandra 发出多个网络调用,是否可以以某种方式进行批处理,因此我想知道 cassandra 是否支持通过 id 列表进行快速查找
select coulmn1 from mytable where id in (1, 3, 5, 7, 9);
?任何帮助或指示将不胜感激?
如果 id
是完整的主键,那么 Cassandra 支持这个,尽管从性能的角度不推荐:>
- 请求被发送到协调器节点
- 协调器节点为每个
id
找到一个副本,并向它们发送单独的请求 - 等待每个节点的结果,将它们收集到结果集 &发回
结果:
- 您的所有子查询都需要等待最慢的副本
- 你有一个额外的网络希望,从协调者到副本
- 您给协调器节点施加了更大的压力,因为它需要将结果保存在内存中
如果您对来自应用程序的每个 id
值进行大量并行异步请求,那么您:
- 避免额外的跃点 - 如果您使用带有令牌感知负载平衡的准备好的语句,则查询将直接发送到副本
- 你可能会在得到结果时开始处理,而不是等待一切
因此发送并行异步请求可能比使用 IN
发送一个请求更快...
I am implementing a feature which requires looking up Cassandra by a list of primary keys.
Below is an example data where id is primary key
mytable
id column1
1 423
2 542
3 678
4 45534
5 435634
6 2435
7 678
8 4564
9 546
Most of my queries a lookup by id, but for some special cases I would like to get data for a list of ids. The way I am currently doing is a follows:
public Object fetchFromCassandraForId(int id);
int ids[] = {1, 3, 5, 7, 9};
List<Object> results;
for(int id: ids) {
results.add(fetchFromCassandraForId(id));
}
This results in issuing multiple network call to cassandra, Is it possible to batch this somehow, therefore i would like to know if cassandra supports fast lookup by list of ids
select coulmn1 from mytable where id in (1, 3, 5, 7, 9);
? Any help or pointers would be appreciated?
If the id
is the full primary key, then Cassandra supports this, although it's not recommended from performance point of view:
- request is sent to coordinator node
- coordinator node finds a replica for each of the
id
, and send individual request to them - wait for results from every node, collect them to result set & send back
As result:
- all your sub-queries need to wait for slowest of the replicas
- you have an additional network hope from coordinator to replica
- you put more pressure to the coordinator node as it need to keep results in memory
If you do a lot of parallel, asynchronous requests for each of the id
values from application, then you:
- avoid an additional hop - if you're using prepared statements with token-aware load balancing, then query is sent directly to replicas
- you may start to process results as you get them, not waiting for everything
So sending parallel asynchronous requests could be faster than sending one request with IN
...
这篇关于通过java中的主键列表进行cassandra查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!