如何确保Cassandra在不同桌面上的数据一致性? [英] How to ensure data consistency in Cassandra on different tables?

查看:280
本文介绍了如何确保Cassandra在不同桌面上的数据一致性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Cassandra的新人,我读过Cassandra鼓励数据的非规范化和重复。这让我有点困惑。
让我们想象下面的情况:



我有一个带四个表格的键空间:A,B,C和D.



CREATE TABLE A(
tableID int,
column1 int,
column2 varchar,
column3 varchar,
column4 varchar,
column5 varchar,
PRIMARY KEY(column1,tableID)
);



让我们假设其他表(B,C,D)具有与表A相同的结构和相同的数据,只能使用不同的主键,以便响应其他查询。



如果我升级表A中的一行,我如何确保具有相同数据的其他表中的数据的一致性?

解决方案

为此,Cassandra提供 BATCH 。从文档


BATCH语句将多个数据修改语言(DML)语句(INSERT,UPDATE,DELETE)组合到一个逻辑操作中,并为语句中写入的所有列设置客户端提供的时间戳批次。批处理多个语句可以节省客户端/服务器和服务器协调器/副本之间的网络交换。然而,由于Cassandra的分布式性质,尽可能地在附近的节点上传播请求以优化性能。使用批次优化性能通常不成功,如使用和滥用批次部分所述。有关加载数据的最快方法的信息,请参阅Cassandra:批量加载而不使用Batch关键字。



默认情况下,批次为原子。在Cassandra批处理操作的上下文中,原子意味着如果任何批次成功,则它将全部成功。为了实现原子性,Cassandra首先将序列化的批次写入批处理系统表,将批处理作为blob数据。批处理中的行已成功写入并保持(或暗示)时,将删除批处理日志数据。对于原子性有一个性能损失。如果您不想受到此处罚,请防止Cassandra使用UNLOGGED选项写入批处理系统:BEGIN UNLOGGED BATCH


UNLOGGED BATCH几乎总是不可取的,我相信在将来的版本中被删除。正常批次提供您所需的功能。


I'm new in Cassandra and I've read that Cassandra encourages denormalization and duplication of data. This leaves me a little confused. Let us imagine the following scenario:

I have a keyspace with four tables: A,B,C and D.

CREATE TABLE A ( tableID int, column1 int, column2 varchar, column3 varchar, column4 varchar, column5 varchar, PRIMARY KEY (column1, tableID) );

Let us imagine that the other tables (B,C,D) have the same structure and the same data that table A, only with a different primary key, in order to respond to other queries.

If I upgrade a row in table A how I can ensure consistency of data in other tables that have the same data?

解决方案

Cassandra provides BATCH for this purpose. From the documentation:

A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE, DELETE) into a single logical operation, and sets a client-supplied timestamp for all columns written by the statements in the batch. Batching multiple statements can save network exchanges between the client/server and server coordinator/replicas. However, because of the distributed nature of Cassandra, spread requests across nearby nodes as much as possible to optimize performance. Using batches to optimize performance is usually not successful, as described in Using and misusing batches section. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."

Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will. To achieve atomicity, Cassandra first writes the serialized batch to the batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have been successfully written and persisted (or hinted) the batchlog data is removed. There is a performance penalty for atomicity. If you do not want to incur this penalty, prevent Cassandra from writing to the batchlog system by using the UNLOGGED option: BEGIN UNLOGGED BATCH

UNLOGGED BATCH is almost always undesirable and I believe is removed in future versions. Normal batches provide the functionality you desire.

这篇关于如何确保Cassandra在不同桌面上的数据一致性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆