Cassandra 中的原子批处理是如何工作的? [英] How do atomic batches work in Cassandra?

查看:21
本文介绍了Cassandra 中的原子批处理是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

原子批处理如何保证单个批处理中的所有语句都将执行或不执行?

How can atomic batches guarantee that either all statements in a single batch will be executed or none?

推荐答案

为了了解批处理的工作原理,查看批处理执行的各个阶段很有帮助.

In order to understand how batches work under the hood, its helpful to look at the individual stages of the batch execution.

客户

使用 CQL3 或现代 Cassandra 客户端 API 支持批处理.在每种情况下,您都可以指定要作为批处理的一部分执行的语句列表、用于所有语句的一致性级别和可选的时间戳.您将能够批量执行 INSERT、DELETE 和 UPDATE 语句.如果您选择不提供时间戳,则会自动使用当前时间并将其与批次相关联.

Batches are supported using CQL3 or modern Cassandra client APIs. In each case you'll be able to specify a list of statements you want to execute as part of the batch, a consistency level to be used for all statements and an optional timestamp. You'll be able to batch execute INSERT, DELETE and UPDATE statements. If you choose not to provide a timestamp, the current time is automatically used and associated with the batch.

如果批处理无法成功执行,客户端将不得不处理两个异常.

The client will have to handle two exception in case the batch could not be executed successfully.

  • UnavailableException - 没有足够的活动节点来完成指定批次 CL 的任何更新
  • WriteTimeoutException - 写入批处理日志或应用批处理中的任何更新时超时.这可以通过读取异常的 writeType 值(BATCH_LOGBATCH)来检查.
  • UnavailableException - there are not enough nodes alive to fulfill any of the updates with specified batch CL
  • WriteTimeoutException - timeout while either writing the batchlog or applying any of the updates within the batch. This can be checked by reading the writeType value of the exception (either BATCH_LOG or BATCH).

批处理日志阶段失败的写入将是 自动重试一次Java 驱动程序中的 DefaultRetryPolicy.批处理日志的创建对于确保在协调器在操作中失败的情况下始终完成批处理至关重要.继续阅读以找出原因.

Failed writes during the batchlog stage will be retried once automatically by the DefaultRetryPolicy in the Java driver. Batchlog creation is critical to ensure that a batch will always be completed in case the coordinator fails mid-operation. Read on for finding out why.

协调员

客户端发送的所有批次都将由协调器执行,就像任何写操作一样.与普通写操作不同的是,Cassandra 还将使用一个专用日志,该日志将包含当前执行的所有待处理的批处理(称为批处理日志).此日志将存储在本地系统密钥空间中,并由每个节点单独管理.每个批处理执行首先在除协调器之外的两个节点上创建一个包含完整批处理的日志条目.在协调器能够在其他节点上创建批处理日志后,它将开始执行批处理中的实际语句.

All batches send by the client will be executed by the coordinator just as with any write operation. Whats different from normal write operations is that Cassandra will also make use of a dedicated log that will contain all pending batches currently executed (called the batchlog). This log will be stored in the local system keyspace and is managed by each node individually. Each batch execution starts by creating a log entry with the complete batch on preferably two nodes other than the coordinator. After the coordinator was able to create the batchlog on the other nodes, it will start to execute the actual statements in the batch.

批处理中的每个语句将使用整个批处理的 CL 和时间戳写入副本.除此之外,此时发生的写入没有什么特别之处.写入也可能被提示或抛出 WriteTimeoutException,可由客户端处理(见上文).

Each statement in the batch will be written to the replicas using the CL and timestamp of the whole batch. Beside from that, there's nothing special about writes happening at this point. Writes may also be hinted or throw a WriteTimeoutException, which can be handled by the client (see above).

批处理执行后,可以安全删除所有创建的批处理日志.因此,协调器将在成功执行后向之前接收到批处理日志的节点发送批处理日志删除消息.这发生在后台,万一失败也不会被注意到.

After the batch has been executed, all created batchlogs can be safely removed. Therefor the coordinator will send a batchlog delete message upon successfull execution to the nodes that have received the batchlog before. This happens in the background and will go unnoticed in case it fails.

让我们总结一下协调器在批处理执行过程中的作用:

Lets wrap up what the coordinator does during batch execution:

  • 将批处理日志发送到另外两个节点(最好在不同的机架中)
  • 批量执行所有语句
  • 在批量执行成功后再次从节点中删除批处理日志

batchlog 副本节点

如上所述,批处理日志将在批处理执行之前在其他两个节点之间复制(如果集群大小允许).这个想法是,如果协调器在完成批处理中的所有语句之前关闭,这些节点中的任何一个都将能够拾取待处理的批处理.

As described above, the batchlog will be replicated across two other nodes (if the cluster size allows it) before batch execution. The idea is that any of these nodes will be able to pick up pending batches in case the coordinator will go down before finishing all statements in the batch.

让思考有点复杂的事实是,这些节点不会注意到协调器不再存在.批处理日志节点将使用批处理执行的当前状态更新的唯一点是协调器发出删除消息指示批处理已成功执行.如果没有收到这样的消息,batchlog 节点将假定由于某些原因该批处理尚未执行并从日志中重播该批处理.

What makes thinks a bit complicated is the fact that those nodes won't notice that the coordinator is not alive anymore. The only point at which the batchlog nodes will be updated with the current status of the batch execution, is when the coordinator is issuing a delete messages indicating the batch has been successfully executed. In case such a message doesn't arrive, the batchlog nodes will assume the batch hasn't been executed for some reasons and replay the batch from the log.

批处理日志重放可能每分钟发生一次,即.这是节点将检查本地批处理日志中是否有任何未由 - 可能被杀死 - 协调器删除的待处理批次的时间间隔.为了在批处理日志创建和实际执行之间给协调器一些时间,使用固定的宽限期(write_request_timeout_in_ms * 2,默认为 4 秒).如果批处理日志在 4 秒后仍然存在,则会重放.

Batchlog replay is taking place potentially every minute, ie. that is the interval a node will check if there are any pending batches in the local batchlog that haven't been deleted by the -possibly killed- coordinator. To give the coordinator some time between the batchlog creation and the actual execution, a fixed grace period is used (write_request_timeout_in_ms * 2, default 4 sec). In case that the batchlog still exists after 4 sec, it will be replayed.

就像 Cassandra 中的任何写操作一样,可能会发生超时.在这种情况下,节点将回退为超时操作写入提示.当超时的副本将再次启动时,可以从提示中恢复写入.无论 hinted_handoff_enabled 是否启用,此行为似乎都不会受到影响.还有一个与提示相关联的 TTL 值,这将导致提示在较长时间后被丢弃(对于任何涉及的 CF 来说都是最小的 GCGraceSeconds).

Just as with any write operation in Cassandra, timeouts may occur. In this case the node will fall back writing hints for the timed out operations. When timed out replicas will be up again, writes can resume from hints. This behavior doesn't seem to be effected whether hinted_handoff_enabled is enabled or not. There's also a TTL value associated with the hint which will cause the hint to be discarded after a longer period of time (smallest GCGraceSeconds for any involved CF).

现在您可能想知道在两个节点上同时重放批处理是否有潜在危险,如果我们在两个节点上复制批处理日志,可能会发生这种情况.这里要记住的重要一点是,由于受支持的操作(更新和删除)的种类有限以及与批处理相关的固定时间戳,每个批处理执行都将是幂等的.即使节点和协调器同时重试执行批处理也不会有任何冲突.

Now you might be wondering if it isn't potentially dangerous to replay a batch on two nodes at the same time, which may happen has we replicate the batchlog on two nodes. Whats important to keep in mind here is that each batch execution will be idempotent due to the limited kind of supported operations (updates and deletes) and the fixed timestamp associated to the batch. There won't be any conflicts even if both nodes and the coordinator will retry executing the batch at the same time.

原子性保证

让我们回到原子批次"的原子性方面并回顾一下原子的确切含义(来源):

Lets get back to the atomicity aspects of "atomic batches" and review what exactly is meant with atomic (source):

"(请注意,我们指的是数据库意义上的原子",如果批处理成功,一切都会成功.不暗示任何其他保证;尤其是没有隔离;其他客户将能够从批处理中读取第一个更新的行,而其他行则在进步."

"(Note that we mean "atomic" in the database sense that if any part of the batch succeeds, all of it will. No other guarantees are implied; in particular, there is no isolation; other clients will be able to read the first updated rows from the batch, while others are in progress."

所以从某种意义上说,我们得到了全有或全无"的保证.在大多数情况下,协调器只会将批处理中的所有语句写入集群.但是,在写入超时的情况下,我们必须通过读取 writeType 值来检查超时发生的时间点.该批次必须已写入批次日志,以确保这些保证仍然适用.同样在这一点上,其他客户端也可以从批处理中读取部分执行的结果.

So in a sense we get "all or nothing" guarantees. In most cases the coordinator will just write all the statements in the batch to the cluster. However, in case of a write timeout, we must check at which point the timeout occurred by reading the writeType value. The batch must have been written to the batchlog in order to be sure that those guarantees still apply. Also at this point other clients may also read partially executed results from the batch.

回到问题,Cassandra 如何保证批处理中的所有语句或根本不执行?原子批处理基本上取决于成功的复制和幂等语句.这不是一个 100% 保证的解决方案,因为理论上可能存在场景 这仍然会导致不一致.但是对于 Cassandra 中的许多用例,如果您知道它是如何工作的,它是一个非常有用的工具.

Getting back to the question, how can Cassandra guarantee that either all or no statements at all in a batch will be executed? Atomic batches basically depend on successful replication and idempotent statements. It's not a 100% guaranteed solution as in theory there might be scenarios that will still cause inconsistencies. But for a lot of use cases in Cassandra its a very useful tool if you're aware how it works.

这篇关于Cassandra 中的原子批处理是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆