DSE 4.6至4.7:最近5毫秒内丢弃了1条MUTATION消息 [英] DSE 4.6 to 4.7: 1 MUTATION messages dropped in last 5000ms

查看:357
本文介绍了DSE 4.6至4.7:最近5毫秒内丢弃了1条MUTATION消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将群集(4DC,ubuntu 14.04 x64,cpp-driver 2.0.1作为我们应用程序中的客户端)从4.6升级到4.7后,在负载较小的几个节点上的日志中获得了消息 MessagingService.java:888-1 MUTATION消息在最近的5000ms内丢弃,线程池转储中有1个待处理的HintedHandoff通知

after upgrade our cluster(4DC, ubuntu 14.04 x64, cpp-driver 2.0.1 as client in our app) from 4.6 to 4.7, got message in logs on few nodes with small load "MessagingService.java:888 - 1 MUTATION messages dropped in last 5000ms" with 1 Pending HintedHandoff notice in thread pool dump

我尝试什么:在群集中

将openjdk更改为oracle jdk(1.7.0_76-b13)

停用节点并重新加入该节点

what i try:
run "nodetool truncatehints" on each running node in cluster
changing openjdk to oracle jdk(1.7.0_76-b13)
decommission node and rejoin it

如何找到这个突变/提示并丢弃它?

how to find this mutation/hint and drop it?

边注:

我们不增加负载(4.6版可以在此负载下正常工作)

我们不减少节点数

我们有sdd支持的存储

side note:
we do not increase load ( version 4.6 work ok with this load)
we do not decrease node count
we have ssd backed storage

已修复 https://issues.apache.org/jira/browse/CASSANDRA-9129

推荐答案

丢失的突变通常意味着您的磁盘无法跟上y我们的摄取。此时,您可能有兴趣找出是否有任何备份的线程池(如果这是IO问题,通常是flushwriters)。这就是为什么cassandra将在那一刻记录踩踏状态的原因。

Dropped mutations usually mean that your disk is not able to keep up with your ingest. You may be interested, at this point, to find out if there are any threadpools backing up (usually flushwriters if this is an IO issue). This is why cassandra will log the treadpool status at that moment.

Cassandra建立在SEDA体系结构上,具有多个线程池,可以处理一定数量的并行任务。 。当活动任务的数量超过池可以同时处理的数量时,待处理的线程池任务就会堆积起来。一旦系统有资源,它们最终将得到处理,或者在极端情况下被丢弃。

Cassandra is built on a SEDA architecture with multiple thread pools that can handle up to a certain number of parallel tasks. Pending threadpool tasks pile up when there are more active tasks than the pool can concurrently handle. They will eventually get processed once the system has resources to do so, or dropped under extreme circumstances.

要查看线程池的当前状态,请使用 nodetool tpstats

To see the current status of your thread pools use nodetool tpstats. Most likely your hints task has already been processed.

事实上,您正在积累提示,这意味着您的某些节点已关闭,并且为了保持一致性,正在重播提示。节点已经恢复。

The fact that you were accumulating hints implies that some of your nodes were down and hints are being replayed for consistency now that the node has come back up.

您的核心问题是删除的突变。如果继续看到此问题,请考虑以下操作之一:

Your core issue is the dropped mutations. Consider one of the following actions if you continue to see this:


  • 添加节点

  • 变得更好存储(不要使用共享存储-IE亚马逊
    EBS,ssd比旋转磁盘快)

  • 减少工作量

  • 确保您以最佳做法进行加载(
    分散负载的良好数据模型,具有负载平衡的datastax驱动程序等)。

  • Add nodes
  • Get better storage (don't use shared storage-- I.E. amazon EBS, ssd's are faster than spinning disks)
  • Decrease your workload
  • Make sure you are loading with best practices (good data model that spreads out the load, a datastax driver that has loadbalancing etc. )

这篇关于DSE 4.6至4.7:最近5毫秒内丢弃了1条MUTATION消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆