具有不同分区键的表上的Cassandra批处理查询性能 [英] Cassandra batch query performance on tables having different partition keys

查看:147
本文介绍了具有不同分区键的表上的Cassandra批处理查询性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个测试用例,其中每秒收到来自客户端的15万个请求。

I have test case in which I receive 150k requests per second from a client.

我的测试用例需要插入 UNLOGGED批处理 到多个表并具有不同的分区键

My test case requires inserting UNLOGGED batch to multiple tables and having different partition keys

BEGIN UNLOGGED  BATCH
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Country' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('US')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='City' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Dallas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='State' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Texas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='SSN' and ptype='text' and date='2017-03-20' and pvalue=decimalAsBlob(000000000);
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Gender' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Female')
APPLY BATCH

有没有比这更好的方法了我要遵循的当前方式?

由于当前,我正在批量插入可能存在于不同集群中的多个表,因为它们具有据我所知,将分区查询插入具有不同分区键的不同表具有额外的权衡。

because currently, I am batch inserting to multiple tables that may be present in the different clusters as they have the different partition key and as of my knowledge inserting batch queries to different tables having different partision key have extra tradeoff.

推荐答案

首先,了解批处理的用例很重要。

At first, it is important to know the use case of batch.



批次经常被误用来优化性能。

Batches are often mistakenly used in an attempt to optimize performance.


批处理用于维护多个表之间的数据一致性。如果需要原子性,则使用记录的批处理。如果在您的情况下,这是一个计数器表,并且如果表之间的计数不需要保持一致,则不要使用批处理。如果群集正常,Cassandra将确保所有写入均成功。

Batches are used to maintain data consistency among multiple tables. If atomicity is needed, logged batch is used. If in your case, this is a counter table and if counts among tables do not need to be consistent, then do not use batch. If you cluster is okay, Cassandra ensures all writes to be sucessful.



未记录的批次需要协调器来管理插入,这会给协调器节点带来沉重的负担。如果其他节点拥有分区密钥,则协调器节点需要处理网络跃点,从而导致传输效率低下。对同一分区键进行更新时,请使用未记录的批次。

Unlogged batches require the coordinator to manage inserts, which can place a heavy load on the coordinator node. If other nodes own partition keys, the coordinator node needs to deal with a network hop, resulting in inefficient delivery. Use unlogged batches when making updates to the same partition key.


请遵循以下文章:

https ://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html

https://medium.com/@foundev/cassandra-batch-loading-不带批处理关键字的40f00e35e23e#.npmx2cnsq

这篇关于具有不同分区键的表上的Cassandra批处理查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆