优化in子句查询cassandra? [英] optimize in clause queries cassandra?

查看:362
本文介绍了优化in子句查询cassandra?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Scylladb中有一张这样的桌子.为了清楚起见,我从下表中删除了很多列,但通常该表共有〜25列.

I have a table like this in Scylladb. To make it clear I have removed lot of columns from below table but in general this table has ~25 columns total.

CREATE TABLE testks.client (
    client_id int,
    lmd timestamp,
    cola list<text>,
    colb list<text>,
    colc boolean,
    cold int,
    cole int,
    colf text,
    colg set<frozen<colg>>,
    colh text,
    PRIMARY KEY (client_id, lmd)
) WITH CLUSTERING ORDER BY (lmd DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 172800
    AND max_index_interval = 1024
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

现在我们的查询模式是这样的.我的IN子句中可以包含多个50 clientIds.

Now our query pattern is like this. I can have more than 50 clientIds in my IN clause.

select * FROM testks.client WHERE client_id IN ? PER PARTITION LIMIT 1

几个问题:

  • 在线阅读后,由于明显的性能原因,IN子句似乎并不理想,因此是否有任何方法可以针对我的查询模式优化表,或者Cassandra/Scylladb不适用于此情况?
  • 我们使用C#驱动程序执行上述查询,并且我们的数据模型和查询模式遇到性能问题.更好地执行单个客户端ID异步,还是我应该继续在其中使用所有clientId的IN子句查询?
  • After reading online it looks like IN clause is not good for obvious performance reasons so is there any way to optimize my table for my query pattern or Cassandra/Scylladb is not the good use case for this?
  • We use C# driver to execute above query and we are seeing performance issues with our data model and query pattern. Is it better to execute individual client id async or I should keep doing IN clause queries with all clientId's in it?

我们在一个DC中运行6个节点的群集,RF为3.我们以本地仲裁的形式读写.

We are running 6 node cluster all in one DC with RF as 3. We read/write as Local Quorum.

推荐答案

在分区键上发布IN时,请求将发送到协调器节点(我不记得了,我认为在这种情况下,它可以是一个任意节点),然后协调器节点将此IN分解为对各个分区的查询,对特定副本执行查询,收集数据并发送给调用方.所有这些都会导致协调器节点与副本之间的额外往返行程,并给协调器带来额外的负担.

When you issue IN on partition key, then request is sent to coordinator node (I don't remember, I think that in this case, it could be an arbitrary node), and then coordinator node decomposes this IN into queries to individual partitions, perform queries to specific replicas, collect data back, and sent to caller. All of this lead to additional round trips between coordinator nodes and replicas, and an additional load to coordinator.

通常,更好的解决方案是对IN列表中的每个分区发出N个异步查询,并在客户端收集数据-使用预处理语句时,驱动程序将能够使用令牌感知的负载平衡,并且会将查询直接发送到具有给定分区的副本,因此您可以避免协调器和副本之间的额外网络往返.

Usually, the better solution would be to issue N asynchronous queries for every partition from the IN list, and collect data on client side - when you use prepared statement, driver will able to use token-aware load balancing, and will send query directly to replica holding given partition, so you can avoid additional network round trips between coordinator and replicas.

这篇关于优化in子句查询cassandra?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆