Cassandra批量查询与单插入性能 [英] Cassandra batch query vs single insert performance
问题描述
我使用 Cassandra java驱动程序
。
我每秒收到150k请求,我插入8个表我有一个更好的方法:
- $ b $ > 批量插入这些表格
- 逐个插入。
我问这个问题是因为,考虑到我的请求大小(150k),批处理听起来像是更好的选择,但因为所有的表都有不同的分区键,批次显得昂贵。
请从以下链接查看我的答案:
批次不是为了提高性能。它们用于确保原子性和隔离。
批处理可以对单分区写操作有效。但批量经常被错误地用于优化性能。根据批量操作,性能可能实际上恶化。
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html
如果这些表之间不需要数据一致性,则使用单个插入。
单个请求在节点之间正确分发或传播(取决于负载平衡策略)。如果你担心请求处理和使用批处理,那么批处理会给协调器节点带来如此多的额外工作,我认为这样做效率不高:)
I use Cassandra java driver
.
I receive 150k requests per second, which I insert to 8 tables having different partition keys.
My question is which is a better way:
- batch inserting to these tables
- inserting one by one.
I am asking this question because , considering my request size (150k), batch sounds like the better option but because all the tables have different partition keys, batch appears expensive.
Please check my answer from below link:
Cassandra batch query performance on tables having different partition keys
Batches are not for improving performance. They are used for ensuring atomicity and isolation.
Batching can be effective for single partition write operations. But batches are often mistakenly used in an attempt to optimize performance. Depending on the batch operation, the performance may actually worsen.
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html
If data consistency is not needed among those tables, then use single insert. Single requests are distributed or propagated properly (depends on load balancing policy) among nodes. If you are concerned about request handling and use batch, batches will burden so many extra works on coordinator nodes which will not be efficient I guess :)
这篇关于Cassandra批量查询与单插入性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!