Cassandra“写入超时”的性质是什么？ [英] What's the nature of Cassandra "write timeout"?

查看：118 发布时间：2020/9/29 20:25:05 cassandra timeout

本文介绍了Cassandra“写入超时”的性质是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在AWS EC2的24节点Cassandra 3.5群集上运行大量写程序（25K / sec写入时有10个线程峰值）（每个主机均为c4.2xlarge类型：8个vcore和15G ram）

I am running a write-heavy program (10 threads peaks at 25K/sec writes) on a 24 node Cassandra 3.5 cluster on AWS EC2 (each host is of c4.2xlarge type: 8 vcore and 15G ram)

我的Java客户端每隔一段时间使用DataStax驱动程序3.0.2，都会遇到写入超时的问题：

Every once in a while my Java client, using DataStax driver 3.0.2, would get write timeout issue:

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency TWO (2 replica were required but only 1 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:73)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:26)
    at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)

该错误很少发生，并且以非常不可预测的方式发生。到目前为止，我无法将故障链接到任何特定的内容（例如，程序运行时间，磁盘上的数据大小，一天中的时间，系统负载的指标，例如CPU，内存，网络指标），但这确实破坏了我们操作。

The error happens infrequently and in a very unpredictable way. So far, I am not able to link the failures to anything specific (e.g. program running time, data size on disk, time of the day, indicators of system load such as CPU, memory, network metrics) Nonetheless, it is really disrupting our operations.

我正在尝试查找问题的根本原因。在网上寻找选项，我对所有线索都有些不知所措，例如

I am trying to find the root cause of the issue. Looking online for options, I am a bit overwhelmed by all the leads out there, such as

在 cassandra.yaml中更改 write_request_timeout_in_ms （已更改为5秒）

使用适当的 RetryPolicy使会话继续进行（已在一个会话级一致性级别上使用DowngradingConsistencyRetryPolicy）

更改缓存大小，堆大小等-从未尝试过使用这些b / c，有充分的理由将其打折为根本原因。

在我的研究过程中，确实让我感到困惑的是，我从一个完全复制的群集中收到了这个错误，而该客户端几乎没有ClientRequest.timeout.write事件：

One thing is really confusing during my research is that I am getting this error from a fully replicated cluster with very few ClientRequest.timeout.write events:

我有一个完全复制的24个节点群集，跨5个aws区域。每个区域至少有2个数据副本

我的程序在会话级别运行一致性级别ONE（带有QueryOption的集群生成器）

出现错误时碰巧，我们的Graphite图表记录了不超过三（3）次主机打ic，即具有Cassandra.ClientRequest.Write.Timeouts.Count值

我已经将write_timeout设置为5秒。该网络相当快（使用iperf3进行验证）并且稳定

I have a fully-replicated 24 node cluster spans 5 aws regions. Each region has at least 2 copies of the data
My program runs consistency level ONE at Session level (Cluster builder with QueryOption)
When the error happened, our Graphite chart registered no more than three (3) host hiccups, i.e. having the Cassandra.ClientRequest.Write.Timeouts.Count values
I already set write_timeout to 5 seconds. The network is pretty fast (using iperf3 to verify) and stable

从表面上看，这种情况应该完全在Cassandra的故障保护范围内。但是为什么我的程序仍然失败？数字不是它们看上去的样子吗？

On paper, the situation should be well within Cassandra's failsafe range. But why my program still failed? Are the numbers not what they appear to be?

Cassandra“写入超时”的性质是什么？ [英] What's the nature of Cassandra "write timeout"?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Cassandra“写入超时”的性质是什么？ [英] What&#39;s the nature of Cassandra &quot;write timeout&quot;?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Cassandra“写入超时”的性质是什么？ [英] What's the nature of Cassandra "write timeout"?

登录关闭