无法记录n分钟的QUEUE延迟 - DSE [英] Cannot record QUEUE latency of n minutes - DSE

查看:241
本文介绍了无法记录n分钟的QUEUE延迟 - DSE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的3个节点群集中的一个节点已关闭,在检查日志文件时,显示以下消息

One of our nodes in our 3 node cluster is down and on checking the log file, it shows the below messages

INFO  [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:32,891  AbstractMetrics.java:114 - Cannot record QUEUE latency of 11 minutes because higher than 10 minutes.
INFO  [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,233  AbstractMetrics.java:114 - Cannot record QUEUE latency of 10 minutes because higher than 10 minutes.
WARN  [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,398  Worker.java:99 - Interrupt/timeout detected.
java.util.concurrent.BrokenBarrierException: null
at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:200) ~[na:1.7.0_79]
at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:355) ~[na:1.7.0_79]
at com.datastax.bdp.concurrent.FlushTask.bulkSync(FlushTask.java:76) ~[dse-core-4.8.3.jar:4.8.3]
at com.datastax.bdp.concurrent.Worker.run(Worker.java:94) ~[dse-core-4.8.3.jar:4.8.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
WARN  [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,398  Worker.java:99 - Interrupt/timeout detected.
java.util.concurrent.BrokenBarrierException: null
at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:200) ~[na:1.7.0_79]
at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:355) ~[na:1.7.0_79]
at com.datastax.bdp.concurrent.FlushTask.bulkSync(FlushTask.java:76) ~[dse-core-4.8.3.jar:4.8.3]
at com.datastax.bdp.concurrent.Worker.run(Worker.java:94) ~[dse-core-4.8.3.jar:4.8.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
INFO  [keyspace.core Index WorkPool work thread-4] 2016-09-14 14:05:33,720  AbstractMetrics.java:114 - Cannot record QUEUE latency of 13 minutes because higher than 10 minutes.
INFO  [keyspace.core Index WorkPool work thread-4] 2016-09-14 14:05:33,721  AbstractMetrics.java:114 - Cannot record QUEUE latency of 13 minutes because higher than 10 minutes.

节点配置为8 CPU,32 GB RAM,500 GB磁盘空间。

The nodes configuration are 8 CPU, 32 GB RAM, 500 GB Disk space. What could be the reasons for only one particular node going down?

推荐答案

因此,我将在这里回答一些一般信息,你的情况可能会更复杂。 32GB RAM对于Solr节点可能不够大;使用Java 1.8上的G1收集器已经证明对于大小超过26GB的堆大小的Solr更好。

So I'm going to answer with some general info here, your case might be more complex. 32GB RAM might not be large enough for a Solr node; using the G1 collector on Java 1.8 has proved better for Solr with heap sizes above 26GB.

我也不知道什么堆大小,JVM设置和多少个solr核心你在这里。但是,我看到类似的错误,当一个节点忙于索引,并试图跟上。在我的经验中,在Solr节点上看到的最常见的问题之一是 dse.yaml 中默认情况下(注释掉) max_solr_concurrency_per_core / code>。这通常会将索引线程的数量分配给CPU核心数,为了进一步复杂的问题,你可能会看到8个核心,但如果你有HT,那么它实际上可能有4个物理核心。

I'm also not sure what heap sizes, JVM settings and how many solr cores you have here. However, I've seen similar errors to this when a node is busy indexing and its trying to keep up. Once of the most common problems seen on Solr nodes in my experience is where the max_solr_concurrency_per_core is left at default (commented out) in the dse.yaml. This will typically allocate the number of indexing threads to the number of CPU cores, and to further compound the problem, you might see 8 cores but if you have HT then its actually likely 4 physical cores.

检查您的 dse.yaml ,并确保您将其设置为 num physcal cpu cores / solr内核数至少为2。这可能会降低索引的速度,但您应该消除您的节点的压力。

Check your dse.yaml and make sure you are setting it to num physcal cpu cores / num of solr cores with 2 at a minimum. This might index slower but you should remove the pressure off of your node.

我建议这个有用的博客在这里是一个良好的开始调整DSE Solr:

I'd recommend this useful blog here as a good start to tuning DSE Solr:

http://www.datastax .com / dev / blog / tuning-dse-search

还有关于主题的文档:

https://docs.datastax.com/en/datastax_enterprise /4.8/datastax_enterprise/srch/srchTune.html

这篇关于无法记录n分钟的QUEUE延迟 - DSE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆