Cassandra中如何保证不同表上的数据一致性? [英] How to ensure data consistency in Cassandra on different tables?

查看:17
本文介绍了Cassandra中如何保证不同表上的数据一致性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Cassandra 的新手,我读过 Cassandra 鼓励非规范化和数据重复.这让我有点困惑.让我们想象一下以下场景:

I'm new in Cassandra and I've read that Cassandra encourages denormalization and duplication of data. This leaves me a little confused. Let us imagine the following scenario:

我有一个包含四个表的键空间:A、B、C 和 D.

I have a keyspace with four tables: A,B,C and D.

CREATE TABLE A (
  tableID int,
  column1 int,
  column2 varchar,
  column3 varchar,
  column4 varchar,
  column5 varchar,
  PRIMARY KEY (column1, tableID)
);

假设其他表(B、C、D)与表A具有相同的结构和相同的数据,只是具有不同的主键,以便响应其他查询.

Let us imagine that the other tables (B,C,D) have the same structure and the same data that table A, only with a different primary key, in order to respond to other queries.

如果我升级表 A 中的一行,如何确保其他具有相同数据的表中数据的一致性?

If I upgrade a row in table A how I can ensure consistency of data in other tables that have the same data?

推荐答案

Cassandra 为此提供了 BATCH.来自文档:

Cassandra provides BATCH for this purpose. From the documentation:

BATCH 语句将多个数据修改语言 (DML) 语句(INSERT、UPDATE、DELETE)组合成一个逻辑操作,并为批处理中语句写入的所有列设置客户端提供的时间戳.批处理多个语句可以节省客户端/服务器和服务器协调器/副本之间的网络交换.但是,由于 Cassandra 的分布式特性,尽可能将请求分散到附近的节点以优化性能.使用批处理来优化性能通常不会成功,如使用和误用批处理部分所述.有关加载数据的最快方式的信息,请参阅Cassandra:不使用 Batch 关键字的批量加载".

A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE, DELETE) into a single logical operation, and sets a client-supplied timestamp for all columns written by the statements in the batch. Batching multiple statements can save network exchanges between the client/server and server coordinator/replicas. However, because of the distributed nature of Cassandra, spread requests across nearby nodes as much as possible to optimize performance. Using batches to optimize performance is usually not successful, as described in Using and misusing batches section. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."

默认情况下批处理是原子的.在 Cassandra 批处理操作的上下文中,原子意味着如果批处理中的任何一个成功,那么所有的都将成功.为了实现原子性,Cassandra 首先将序列化批处理写入批处理日志系统表,该系统表将序列化批处理作为 blob 数据使用.当批处理中的行已成功写入并持久化(或提示)时,批处理日志数据将被删除.原子性有性能损失.如果您不想招致这种惩罚,请使用 UNLOGGED 选项防止 Cassandra 写入批处理日志系统:BEGIN UNLOGGED BATCH

Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will. To achieve atomicity, Cassandra first writes the serialized batch to the batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have been successfully written and persisted (or hinted) the batchlog data is removed. There is a performance penalty for atomicity. If you do not want to incur this penalty, prevent Cassandra from writing to the batchlog system by using the UNLOGGED option: BEGIN UNLOGGED BATCH

UNLOGGED BATCH 几乎总是不受欢迎的,我相信在未来的版本中会被删除.正常批次提供您想要的功能.

UNLOGGED BATCH is almost always undesirable and I believe is removed in future versions. Normal batches provide the functionality you desire.

这篇关于Cassandra中如何保证不同表上的数据一致性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆