Cassandra 对具有不同分区键的表的批量查询性能 [英] Cassandra batch query performance on tables having different partition keys

查看:44
本文介绍了Cassandra 对具有不同分区键的表的批量查询性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个测试用例,我每秒从客户端收到 15 万个请求.

我的测试用例需要插入UNLOGGED批处理 到多个表并具有不同的分区键

BEGIN UNLOGGED BATCH更新 kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Country' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('US')更新 kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='City' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Dallas')更新 kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='State' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Texas')更新 kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='SSN' and ptype='text' and date='2017-03-20' and pvalue=decimalAsBlob(000000000);更新 kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Gender' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Female')申请批次

有没有比我目前遵循的方法更好的方法?

因为目前,我正在批量插入可能存在于不同集群中的多个表,因为它们具有不同的分区键,据我所知,将批量查询插入到具有不同分区键的不同表有额外的权衡.

解决方案

首先,了解批处理的用例很重要.

<块引用><块引用>

批处理经常被错误地用于尝试优化性能.

批处理用于维护多个表之间的数据一致性.如果需要原子性,则使用记录的批处理.如果在您的情况下,这是一个计数器表,并且表之间的计数不需要一致,则不要使用批处理.如果集群没问题,Cassandra 会确保所有写入都成功.

<块引用><块引用>

未记录的批处理需要协调器来管理插入,这会给协调器节点带来沉重的负载.如果其他节点拥有分区键,则协调器节点需要处理一个网络跃点,导致交付效率低下.对同一分区键进行更新时使用未记录的批次.

请关注以下文章:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html

https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.npmx2cnsq

I have test case in which I receive 150k requests per second from a client.

My test case requires inserting UNLOGGED batch to multiple tables and having different partition keys

BEGIN UNLOGGED  BATCH
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Country' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('US')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='City' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Dallas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='State' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Texas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='SSN' and ptype='text' and date='2017-03-20' and pvalue=decimalAsBlob(000000000);
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Gender' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Female')
APPLY BATCH

Is there a better way than the current way that I i'm following?

because currently, I am batch inserting to multiple tables that may be present in the different clusters as they have the different partition key and as of my knowledge inserting batch queries to different tables having different partision key have extra tradeoff.

解决方案

At first, it is important to know the use case of batch.

Batches are often mistakenly used in an attempt to optimize performance.

Batches are used to maintain data consistency among multiple tables. If atomicity is needed, logged batch is used. If in your case, this is a counter table and if counts among tables do not need to be consistent, then do not use batch. If you cluster is okay, Cassandra ensures all writes to be sucessful.

Unlogged batches require the coordinator to manage inserts, which can place a heavy load on the coordinator node. If other nodes own partition keys, the coordinator node needs to deal with a network hop, resulting in inefficient delivery. Use unlogged batches when making updates to the same partition key.

Please follow below articles:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html

https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.npmx2cnsq

这篇关于Cassandra 对具有不同分区键的表的批量查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆