通过java中的主键列表进行cassandra查找 [英] cassandra lookup by list of primary keys in java

查看:27
本文介绍了通过java中的主键列表进行cassandra查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实现一项功能,该功能需要通过主键列表查找 Cassandra.

下面是一个示例数据,其中 id 是主键

mytableid 列 11 4232 5423 6784 455345 4356346 24357 6788 45649 546

我的大多数查询都是按 id 查找,但对于某些特殊情况,我想获取 id 列表的数据.我目前的做法如下:

<代码>公共对象 fetchFromCassandraForId(int id);int ids[] = {1, 3, 5, 7, 9};列表<对象>结果;for(int id: ids) {results.add(fetchFromCassandraForId(id));}

这导致向 cassandra 发出多个网络调用,是否可以以某种方式进行批处理,因此我想知道 cassandra 是否支持通过 id 列表进行快速查找

select coulmn1 from mytable where id in (1, 3, 5, 7, 9);

?任何帮助或指示将不胜感激?

解决方案

如果 id 是完整的主键,那么 Cassandra 支持这个,尽管从性能的角度不推荐:

  • 请求被发送到协调器节点
  • 协调器节点为每个id找到一个副本,并向它们发送单独的请求
  • 等待每个节点的结果,将它们收集到结果集 &发回

结果:

  • 您的所有子查询都需要等待最慢的副本
  • 你有一个额外的网络希望,从协调者到副本
  • 您给协调器节点施加了更大的压力,因为它需要将结果保存在内存中

如果您对来自应用程序的每个 id 值进行大量并行异步请求,那么您:

  • 避免额外的跃点 - 如果您使用带有令牌感知负载平衡的准备好的语句,则查询将直接发送到副本
  • 你可能会在得到结果时开始处理,而不是等待一切

因此发送并行异步请求可能比使用 IN 发送一个请求更快...

I am implementing a feature which requires looking up Cassandra by a list of primary keys.

Below is an example data where id is primary key

mytable
id          column1
1           423
2           542
3           678
4           45534
5           435634
6           2435
7           678
8           4564
9           546

Most of my queries a lookup by id, but for some special cases I would like to get data for a list of ids. The way I am currently doing is a follows:


public Object fetchFromCassandraForId(int id);

int ids[] = {1, 3, 5, 7, 9};
List<Object> results;
for(int id: ids) {
  results.add(fetchFromCassandraForId(id));
}

This results in issuing multiple network call to cassandra, Is it possible to batch this somehow, therefore i would like to know if cassandra supports fast lookup by list of ids

select coulmn1 from mytable where id in (1, 3, 5, 7, 9);

? Any help or pointers would be appreciated?

解决方案

If the id is the full primary key, then Cassandra supports this, although it's not recommended from performance point of view:

  • request is sent to coordinator node
  • coordinator node finds a replica for each of the id, and send individual request to them
  • wait for results from every node, collect them to result set & send back

As result:

  • all your sub-queries need to wait for slowest of the replicas
  • you have an additional network hope from coordinator to replica
  • you put more pressure to the coordinator node as it need to keep results in memory

If you do a lot of parallel, asynchronous requests for each of the id values from application, then you:

  • avoid an additional hop - if you're using prepared statements with token-aware load balancing, then query is sent directly to replicas
  • you may start to process results as you get them, not waiting for everything

So sending parallel asynchronous requests could be faster than sending one request with IN...

这篇关于通过java中的主键列表进行cassandra查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆