如何确保 Cassandra 在不同表上的数据一致性? [英] How to ensure data consistency in Cassandra on different tables?

查看:12
本文介绍了如何确保 Cassandra 在不同表上的数据一致性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Cassandra 的新手,我听说 Cassandra 鼓励数据的非规范化和重复.这让我有点困惑.让我们想象以下场景:

I'm new in Cassandra and I've read that Cassandra encourages denormalization and duplication of data. This leaves me a little confused. Let us imagine the following scenario:

我有一个包含四个表的键空间:A、B、C 和 D.

I have a keyspace with four tables: A,B,C and D.

CREATE TABLE A (
  tableID int,
  column1 int,
  column2 varchar,
  column3 varchar,
  column4 varchar,
  column5 varchar,
  PRIMARY KEY (column1, tableID)
);

让我们假设其他表(B、C、D)与表 A 具有相同的结构和相同的数据,只是主键不同,以便响应其他查询.

Let us imagine that the other tables (B,C,D) have the same structure and the same data that table A, only with a different primary key, in order to respond to other queries.

如果我升级表 A 中的一行,如何确保具有相同数据的其他表中的数据一致性?

If I upgrade a row in table A how I can ensure consistency of data in other tables that have the same data?

推荐答案

Cassandra 为此提供了 BATCH.来自文档:

Cassandra provides BATCH for this purpose. From the documentation:

BATCH 语句将多个数据修改语言 (DML) 语句(INSERT、UPDATE、DELETE)组合成一个逻辑操作,并为批处理中的语句写入的所有列设置客户端提供的时间戳.批处理多个语句可以节省客户端/服务器和服务器协调器/副本之间的网络交换.但是,由于 Cassandra 的分布式特性,请尽可能将请求分散到附近的节点以优化性能.使用批处理来优化性能通常不会成功,如使用和滥用批处理部分所述.有关加载数据的最快方法的信息,请参阅Cassandra:不使用 Batch 关键字的批量加载".

A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE, DELETE) into a single logical operation, and sets a client-supplied timestamp for all columns written by the statements in the batch. Batching multiple statements can save network exchanges between the client/server and server coordinator/replicas. However, because of the distributed nature of Cassandra, spread requests across nearby nodes as much as possible to optimize performance. Using batches to optimize performance is usually not successful, as described in Using and misusing batches section. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."

默认情况下,批次是原子的.在 Cassandra 批处理操作的上下文中,原子意味着如果任何批处理成功,那么所有批处理都会成功.为了实现原子性,Cassandra 首先将序列化批处理写入批处理日志系统表,该系统表将序列化批处理作为 blob 数据使用.当批处理中的行已成功写入并持久化(或提示)后,将删除批处理日志数据.原子性会降低性能.如果您不想招致此惩罚,请使用 UNLOGGED 选项阻止 Cassandra 写入批处理日志系统:BEGIN UNLOGGED BATCH

Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will. To achieve atomicity, Cassandra first writes the serialized batch to the batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have been successfully written and persisted (or hinted) the batchlog data is removed. There is a performance penalty for atomicity. If you do not want to incur this penalty, prevent Cassandra from writing to the batchlog system by using the UNLOGGED option: BEGIN UNLOGGED BATCH

UNLOGGED BATCH 几乎总是不受欢迎的,我相信在未来的版本中会被删除.普通批次提供您想要的功能.

UNLOGGED BATCH is almost always undesirable and I believe is removed in future versions. Normal batches provide the functionality you desire.

这篇关于如何确保 Cassandra 在不同表上的数据一致性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆