Cassandra 提交日志说明 [英] Cassandra commit log clarification

查看:18
本文介绍了Cassandra 提交日志说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了一些关于 Cassandra 提交日志的文档,对我来说,关于这个结构"的信息存在冲突.该图显示,当发生写入时,Cassandra 会写入内存表和提交日志.令人困惑的部分是此提交日志所在的位置.

I have read over several documents regarding the Cassandra commit log and, to me, there is conflicting information regarding this "structure(s)". The diagram shows that when a write occurs, Cassandra writes to the memtable and commit log. The confusing part is where this commit log resides.

我反复看到的图表显示了磁盘上的提交日志.但是,如果您多读一些书,他们还会提到内存中的提交日志缓冲区 - 并且每 10 秒将那块内存刷新到磁盘.

The diagram that I've seen over-and-over shows the commit log on disk. However, if you do some more reading, they also talk about a commit log buffer in memory - and that piece of memory is flushed to disk every 10 seconds.

DataStax 文档说明:当写入发生时,Cassandra 将数据存储在称为 memtable 的内存结构中,并提供可配置的持久性,它还将写入附加到内存中的提交日志缓冲区.该缓冲区每 10 秒刷新到磁盘".

DataStax Documentation states: "When a write occurs, Cassandra stores the data in a memory structure called memtable, and to provide configurable durability, it also appends writes to the commit log buffer in memory. This buffer is flushed to disk every 10 seconds".

他们的图表中没有任何地方显示称为提交日志缓冲区的内存结构.它们只显示驻留在磁盘上的提交日志.

Nowhere in their diagram do they show a memory structure called a commit log buffer. They only show the commit log residing on disk.

它还指出:当写入发生时,Cassandra 将数据存储在内存中的一个结构中,即 memtable,并将写入附加到磁盘上的提交日志."

It also states: "When a write occurs, Cassandra stores the data in a structure in memory, the memtable, and also appends writes to the commit log on disk."

所以我对上面的内容感到困惑.它是写入提交日志内存缓冲区,最终刷新到磁盘(我认为也称为提交日志"),还是写入内存表并提交磁盘上的日志?

So I'm confused by the above. Is it written to the commit log memory buffer, which is eventually flushed to disk (which I would assume is also called the "commit log"), or is it written to the memtable and commit log on disk?

Apache 的文档说明了这一点:相反,与其他现代系统一样,Cassandra 通过首先将写入附加到 commitlog 来提供持久性.这意味着只有 commitlog 需要被 fsync'd,如果 commitlog 在它自己的卷上,则不需要寻找,因为commitlog 是附加的.实现细节在 ArchitectureCommitLog 中.

Apache's documentation states this: "Instead, like other modern systems, Cassandra provides durability by appending writes to a commitlog first. This means that only the commitlog needs to be fsync'd, which, if the commitlog is on its own volume, obviates the need for seeking since the commitlog is append-only. Implementation details are in ArchitectureCommitLog.

Cassandra 的默认配置将 commitlog_sync 模式设置为定期,导致每 commitlog_sync_period_in_ms 毫秒同步一次提交日志,因此如果所有副本在该时间段内崩溃,您可能会丢失多达那么多数据."

Cassandra's default configuration sets the commitlog_sync mode to periodic, causing the commitlog to be synced every commitlog_sync_period_in_ms milliseconds, so you can potentially lose up to that much data if all replicas crash within that window of time."

我从 Apache 声明中推断出,仅由于写入的异步性质(确认缓存写入),您可能会丢失数据(它甚至指出如果所有副本在刷新/同步之前崩溃,您可能会丢失数据)'d).

What I have inferred from the Apache statement is that ONLY because of the asynchronous nature of writes (acknowledgement of a cache write) could you lose data (it even states you can lose data if all replicas crash before it is flushed/sync'd).

我不确定我可以从 DataStax 文档和图表中推断出什么,因为他们提到了关于提交日志的两种不同的陈述 - 一种在内存中,一种在磁盘上.

I'm not sure what I can infer from the DataStax documentation and diagram as they've mentioned two different statements regarding the commit log - one in memory, one on disk.

谁能澄清我认为措辞不当且相互矛盾的文档集?

Can anyone clarify, what I consider, a poorly worded and conflicting set of documentation?

我假设有一个提交日志缓冲区,因为它们都引用了它(但 DataStax 没有在图中显示它).我认为,如何以及何时进行管理是理解的关键.

I'll assume there is a commit log buffer, as they both reference it (yet DataStax doesn't show it in the diagram). How and when this is managed, I think, is a key to understand.

推荐答案

通常在解释写入路径时,提交日志的特征是一个文件——确实,提交日志是提供持久性的磁盘存储机制.更深入时会引入混淆,并且引入了有关缓冲区缓存和必须发出 fsync 的部分.对内存中的提交日志缓冲区"的引用是指操作系统缓冲区缓存,而不是 Cassandra 中的内存结构.你可以在 代码 表明提交日志没有单独的内存结构,而是将突变序列化并写入 文件支持的缓冲区.

Generally when explaining the write path, the commit log is characterized as a file - and it's true the commit log is the on-disk storage mechanism that provides durability. The confusion is introduced when going deeper and the part about buffer cache and having to issue fsyncs is introduced. The reference to "commit log buffer in memory" is talking about OS buffer cache, not a memory structure in Cassandra. You can see in the code that there's not a separate in-memory structure for the commit log, but rather the mutation is serialized and written to a file-backed buffer.

Cassandra 提供了两种策略来管理提交日志上的 fsync.

Cassandra comes with two strategies for managing fsync on the commit log.

commitlog_sync 
    (Default: periodic) The method that Cassandra uses to acknowledge writes in milliseconds:
    periodic: (Default: 10000 milliseconds [10 seconds])
    Used with commitlog_sync_period_in_ms to control how often the commit log is synchronized to disk. Periodic syncs are acknowledged immediately.

    batch: (Default: disabled)note
    Used with commitlog_sync_batch_window_in_ms (Default: 2 ms) to control how long Cassandra waits for other writes before performing a sync. When using this method, writes are not acknowledged until fsynced to disk.

periodic 提供了更好的性能,但代价是数据丢失的可能性略有增加.batch 设置以延迟为代价保证了持久性.

The periodic offers better performance at the cost of a small increase in the chance that data can be lost. The batch setting guarantees durability at the cost of latency.

这篇关于Cassandra 提交日志说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆