Cassandra提交日志澄清 [英] Cassandra commit log clarification

查看:37
本文介绍了Cassandra提交日志澄清的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了一些有关Cassandra提交日志的文档,对我来说,关于结构的信息有冲突。该图显示,发生写操作时,Cassandra会写入内存表和提交日志。令人困惑的部分是此提交日志所在的位置。

I have read over several documents regarding the Cassandra commit log and, to me, there is conflicting information regarding this "structure(s)". The diagram shows that when a write occurs, Cassandra writes to the memtable and commit log. The confusing part is where this commit log resides.

我反复看到的图表显示了磁盘上的提交日志。但是,如果您多读一些书,他们还会谈论内存中的提交日志缓冲区-并且每10秒就会将该内存刷新到磁盘上。

The diagram that I've seen over-and-over shows the commit log on disk. However, if you do some more reading, they also talk about a commit log buffer in memory - and that piece of memory is flushed to disk every 10 seconds.

DataStax文档状态:
发生写操作时,Cassandra将数据存储在称为memtable的内存结构中,并且为了提供可配置的持久性,它还将写操作追加到内存中的提交日志缓冲区。此缓冲区每10秒刷新一次到磁盘。

DataStax Documentation states: "When a write occurs, Cassandra stores the data in a memory structure called memtable, and to provide configurable durability, it also appends writes to the commit log buffer in memory. This buffer is flushed to disk every 10 seconds".

在图中没有显示出称为提交日志缓冲区的内存结构。它们只显示驻留在磁盘上的提交日志。

Nowhere in their diagram do they show a memory structure called a commit log buffer. They only show the commit log residing on disk.

它还指出:
发生写操作时,Cassandra将数据存储在内存中的结构中, memtable,并且还将写操作追加到磁盘上的提交日志。

It also states: "When a write occurs, Cassandra stores the data in a structure in memory, the memtable, and also appends writes to the commit log on disk."

因此,我对以上内容感到困惑。是将其写入提交日志内存缓冲区,然后将其最终刷新到磁盘(我认为这也称为提交日志),还是将其写入内存表并在磁盘上提交日志?

So I'm confused by the above. Is it written to the commit log memory buffer, which is eventually flushed to disk (which I would assume is also called the "commit log"), or is it written to the memtable and commit log on disk?

Apache的文档指出:
相反,与其他现代系统一样,Cassandra通过首先将写入追加到commitlog来提供持久性。这意味着仅需对commitlog进行fsync,

Apache's documentation states this: "Instead, like other modern systems, Cassandra provides durability by appending writes to a commitlog first. This means that only the commitlog needs to be fsync'd, which, if the commitlog is on its own volume, obviates the need for seeking since the commitlog is append-only. Implementation details are in ArchitectureCommitLog.

Cassandra的默认配置设置了该属性,因此,如果commitlog在其自己的卷上,则由于commitlog仅是追加的,因此无需查找。 commitlog_sync模式为定期模式,导致提交日志每隔commitlog_sync_period_in_ms毫秒同步一次,因此,如果所有副本在该时间段内崩溃,则可能会丢失多达这么多的数据。

Cassandra's default configuration sets the commitlog_sync mode to periodic, causing the commitlog to be synced every commitlog_sync_period_in_ms milliseconds, so you can potentially lose up to that much data if all replicas crash within that window of time."

我从Apache语句得出的结论是,这仅是由于写入的异步性质(致谢您可能会丢失数据(甚至声明如果所有副本在刷新/同步之前都崩溃了,则可能会丢失数据)。

What I have inferred from the Apache statement is that ONLY because of the asynchronous nature of writes (acknowledgement of a cache write) could you lose data (it even states you can lose data if all replicas crash before it is flushed/sync'd).

我不确定我能从DataStax文档和图表中得出什么,因为他们提到了有关提交日志的两种不同的语句-一条在内存中,一条在磁盘上。

I'm not sure what I can infer from the DataStax documentation and diagram as they've mentioned two different statements regarding the commit log - one in memory, one on disk.

我认为有人可以澄清措辞不佳且相互矛盾的一组文档吗?

Can anyone clarify, what I consider, a poorly worded and conflicting set of documentation?

我是假设有一个提交日志缓冲区,因为它们都引用了该缓冲区(但DataStax并未在图中显示它)。我认为,如何以及何时进行管理是理解的关键。

I'll assume there is a commit log buffer, as they both reference it (yet DataStax doesn't show it in the diagram). How and when this is managed, I think, is a key to understand.

推荐答案

通常在解释写入路径时,即提交日志的特征是文件-确实,提交日志是提供持久性的磁盘存储机制。深入探讨时会引入混乱,并介绍有关缓冲区高速缓存和必须发出fsync的部分。提到在内存中提交日志缓冲区是指OS缓冲区高速缓存,而不是Cassandra中的内存结构。您可以在代码,表明提交日志没有单独的内存结构,而是将突变序列化并写入文件支持的缓冲区

Generally when explaining the write path, the commit log is characterized as a file - and it's true the commit log is the on-disk storage mechanism that provides durability. The confusion is introduced when going deeper and the part about buffer cache and having to issue fsyncs is introduced. The reference to "commit log buffer in memory" is talking about OS buffer cache, not a memory structure in Cassandra. You can see in the code that there's not a separate in-memory structure for the commit log, but rather the mutation is serialized and written to a file-backed buffer.

Cassandra提供了两种在提交日志上管理fsync的策略。

Cassandra comes with two strategies for managing fsync on the commit log.

commitlog_sync 
    (Default: periodic) The method that Cassandra uses to acknowledge writes in milliseconds:
    periodic: (Default: 10000 milliseconds [10 seconds])
    Used with commitlog_sync_period_in_ms to control how often the commit log is synchronized to disk. Periodic syncs are acknowledged immediately.

    batch: (Default: disabled)note
    Used with commitlog_sync_batch_window_in_ms (Default: 2 ms) to control how long Cassandra waits for other writes before performing a sync. When using this method, writes are not acknowledged until fsynced to disk.

周期提供了更好的性能,但付出了代价数据丢失机会的增加很小。 batch 设置可以保证持久性,但会增加延迟。

The periodic offers better performance at the cost of a small increase in the chance that data can be lost. The batch setting guarantees durability at the cost of latency.

这篇关于Cassandra提交日志澄清的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆