无法记录n分钟的QUEUE延迟 - DSE [英] Cannot record QUEUE latency of n minutes - DSE
问题描述
我们的3个节点群集中的一个节点已关闭,在检查日志文件时,显示以下消息
One of our nodes in our 3 node cluster is down and on checking the log file, it shows the below messages
INFO [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:32,891 AbstractMetrics.java:114 - Cannot record QUEUE latency of 11 minutes because higher than 10 minutes.
INFO [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,233 AbstractMetrics.java:114 - Cannot record QUEUE latency of 10 minutes because higher than 10 minutes.
WARN [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,398 Worker.java:99 - Interrupt/timeout detected.
java.util.concurrent.BrokenBarrierException: null
at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:200) ~[na:1.7.0_79]
at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:355) ~[na:1.7.0_79]
at com.datastax.bdp.concurrent.FlushTask.bulkSync(FlushTask.java:76) ~[dse-core-4.8.3.jar:4.8.3]
at com.datastax.bdp.concurrent.Worker.run(Worker.java:94) ~[dse-core-4.8.3.jar:4.8.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
WARN [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,398 Worker.java:99 - Interrupt/timeout detected.
java.util.concurrent.BrokenBarrierException: null
at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:200) ~[na:1.7.0_79]
at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:355) ~[na:1.7.0_79]
at com.datastax.bdp.concurrent.FlushTask.bulkSync(FlushTask.java:76) ~[dse-core-4.8.3.jar:4.8.3]
at com.datastax.bdp.concurrent.Worker.run(Worker.java:94) ~[dse-core-4.8.3.jar:4.8.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
INFO [keyspace.core Index WorkPool work thread-4] 2016-09-14 14:05:33,720 AbstractMetrics.java:114 - Cannot record QUEUE latency of 13 minutes because higher than 10 minutes.
INFO [keyspace.core Index WorkPool work thread-4] 2016-09-14 14:05:33,721 AbstractMetrics.java:114 - Cannot record QUEUE latency of 13 minutes because higher than 10 minutes.
节点配置为8 CPU,32 GB RAM,500 GB磁盘空间。
The nodes configuration are 8 CPU, 32 GB RAM, 500 GB Disk space. What could be the reasons for only one particular node going down?
推荐答案
因此,我将在这里回答一些一般信息,你的情况可能会更复杂。 32GB RAM对于Solr节点可能不够大;使用Java 1.8上的G1收集器已经证明对于大小超过26GB的堆大小的Solr更好。
So I'm going to answer with some general info here, your case might be more complex. 32GB RAM might not be large enough for a Solr node; using the G1 collector on Java 1.8 has proved better for Solr with heap sizes above 26GB.
我也不知道什么堆大小,JVM设置和多少个solr核心你在这里。但是,我看到类似的错误,当一个节点忙于索引,并试图跟上。在我的经验中,在Solr节点上看到的最常见的问题之一是 dse.yaml 中默认情况下(注释掉)
max_solr_concurrency_per_core
/ code>。这通常会将索引线程的数量分配给CPU核心数,为了进一步复杂的问题,你可能会看到8个核心,但如果你有HT,那么它实际上可能有4个物理核心。
I'm also not sure what heap sizes, JVM settings and how many solr cores you have here. However, I've seen similar errors to this when a node is busy indexing and its trying to keep up. Once of the most common problems seen on Solr nodes in my experience is where the max_solr_concurrency_per_core
is left at default (commented out) in the dse.yaml
. This will typically allocate the number of indexing threads to the number of CPU cores, and to further compound the problem, you might see 8 cores but if you have HT then its actually likely 4 physical cores.
检查您的 dse.yaml
,并确保您将其设置为 num physcal cpu cores / solr内核数
至少为2。这可能会降低索引的速度,但您应该消除您的节点的压力。
Check your dse.yaml
and make sure you are setting it to num physcal cpu cores / num of solr cores
with 2 at a minimum. This might index slower but you should remove the pressure off of your node.
我建议这个有用的博客在这里是一个良好的开始调整DSE Solr:
I'd recommend this useful blog here as a good start to tuning DSE Solr:
http://www.datastax .com / dev / blog / tuning-dse-search
还有关于主题的文档:
https://docs.datastax.com/en/datastax_enterprise /4.8/datastax_enterprise/srch/srchTune.html
这篇关于无法记录n分钟的QUEUE延迟 - DSE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!