Solr (JVM) 每小时达到峰值 [英] Solr (JVM) peak every hour

查看:49
本文介绍了Solr (JVM) 每小时达到峰值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已解决

在我们的例子中,问题是 SuggestRequestHandler (requestHandler name="/suggest") 现在已经设置了 facelimit:10对于应用程序发出的每个建议请求,也有多个请求.为什么这会导致(只是)每小时高峰尚不清楚......

In our case the problem was that for the SuggestRequestHandler (requestHandler name="/suggest") now facelimit has been set: 10 Also there has been several requests for each single suggest request made by the application. Why this led to a (just) hourly peak is quite not clear yet...

感谢大家的提示和帮助 - 我很感激!

Thank you all for tips and help - I appreciated!

每整整一个小时(12:00、13:00、14:00、...、20:00、21:00、22:00、23:00),我们的 Solr/Java 进程都会出现一个峰值——这意味着Solr 运行的 Java 进程增加了 3 倍的 CPU 使用率和响应时间——通常需要毫秒来响应,最多 9 秒.始终为 2 - 3 分钟,并且仅当我们的网站上有流量时(有一个调用 Java 的 php 应用程序).Crond 已完全禁用,但每整整一小时仍然存在问题.基本上我认为我们尝试了几乎所有 GC 和内存组合(或者可能没有?)

Every full hour (12:00, 13:00, 14:00, ..., 20:00, 21:00, 22:00, 23:00) our Solr/Java process has a peak - which means the Java process where Solr is running increases 3x times CPU usage and the response time take - which usually takes msecs to respond, up to 9 seconds. Always for 2 - 3 minutes and only if there is traffic on our site (there is a php application which calls Java). Crond was completely disabled but still the problem on every full hour. And basically I think we tried almost every GC and memory combination (or maybe not?)

有人知道为什么会这样 - 这里有一些细节:

Someone any idea why this happens - here some details:

  • 系统:32 GB RAM,24 核(主要与 php-fpm 共享,但也仅将 solr 隔离作为测试相同问题)
  • Solr 3.6 版(在 Jetty 上 - 暂时也是 Glassfish)
  • 操作系统:RHEL 5.7
  • 多核设置(4 个索引,每 2 个内核)

使用的处理程序(solrconfig.xml):

Used Handlers (solrconfig.xml):

<requestHandler name="standard" class="solr.SearchHandler" default="true">
<requestHandler name="dismax" class="solr.SearchHandler" >
<requestHandler name="/suggest" class="solr.SearchHandler">
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/analysis/document" class="solr.DocumentAnalysisRequestHandler" />
<requestHandler name="/analysis/field" class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<requestHandler name="/admin/ping" class="PingRequestHandler">
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler" >
<requestHandler name="/replication" class="solr.ReplicationHandler" >

(也在没有复制和 ping 的情况下测试)

(also tested without replication and ping)

使用的过滤器:

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PortugueseMinimalStemFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.PortugueseMinimalStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="30" minGramSize="1"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory" />

索引大小:~100 MB(实际上甚至更少)

Index size: ~100 MB (actually even a bit less)

当前的 Java 选项:

Current Java Options:

JAVA_OPTS="-Xmx4096m -Xms4096m -XX:+UseGCOverheadLimit -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:MaxPermSize=128m -XX:+DisableExplicitGC -Dsun.rmi.dgc.server.gcInterval=300000 -Dsun.rmi.dgc.client.gcInterval=300000 -XX:NewRatio=1 -Xloggc:/shop/logs/live/solr/gc.log -verbose:gc -XX:+PrintGCDateStamps"

相同的选项,但使用 1024、2048、8192 和 12 GB 根本没有帮助.

The same options but with 1024, 2048, 8192 and 12 GB didn't help at all.

其他尝试:

JAVA_OPTS="-server -Xmx2048m -XX:MaxPermSize=128m -XX:+UseParNewGC     -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=256 -XX:CMSInitiatingOccupancyFraction=60 -XX:+DisableExplicitGC"

其他尝试:

JAVA_OPTS="-Xmx2048m -Xms2048m -XX:+UseGCOverheadLimit -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:MaxPermSize=128m -XX:+DisableExplicitGC -Djava.util.logging.config.file=/opt/solr-jetty/etc/jetty-logging.properties"

这里是 gc.log 的摘录(这样一个小时的问题):

Here a excerpt of the gc.log (of such a full hour problem):

2013-03-03T19:59:04.157-0300: 8087.754: [GC 3433559K->1788819K(3914560K), 0.0358190 secs]
2013-03-03T19:59:12.031-0300: 8095.628: [GC 3437075K->1792088K(3914560K), 0.0365830 secs]
2013-03-03T19:59:22.419-0300: 8106.016: [GC 3440344K->1803266K(3914560K), 0.0422040 secs]
2013-03-03T19:59:29.044-0300: 8112.641: [GC 3451522K->1815743K(3914560K), 0.0439870 secs]
2013-03-03T19:59:37.002-0300: 8120.599: [GC 3463999K->1821601K(3914560K), 0.0378990 secs]
2013-03-03T19:59:45.468-0300: 8129.065: [GC 3469857K->1822911K(3914560K), 0.0386720 secs]
2013-03-03T19:59:53.750-0300: 8137.347: [GC 3471167K->1829299K(3914560K), 0.0405040 secs]
2013-03-03T20:00:01.829-0300: 8145.426: [GC 3477555K->1832046K(3914560K), 0.0383070 secs]
2013-03-03T20:00:06.327-0300: 8149.924: [GC 3480302K->1831567K(3914560K), 0.0450550 secs]
2013-03-03T20:00:11.123-0300: 8154.719: [GC 3479823K->1843283K(3914560K), 0.0401710 secs]
2013-03-03T20:00:14.360-0300: 8157.957: [GC 3491539K->1854079K(3914560K), 0.0368560 secs]
2013-03-03T20:00:17.419-0300: 8161.015: [GC 3502335K->1855130K(3914560K), 0.0375530 secs]
2013-03-03T20:00:20.006-0300: 8163.603: [GC 3503386K->1861867K(3914560K), 0.0413470 secs]
2013-03-03T20:00:22.726-0300: 8166.323: [GC 3510123K->1870292K(3914560K), 0.0360600 secs]
2013-03-03T20:00:25.420-0300: 8169.017: [GC 3518548K->1872701K(3914560K), 0.0326970 secs]
2013-03-03T20:00:27.138-0300: 8170.735: [GC 3520957K->1873446K(3914560K), 0.0381430 secs]
2013-03-03T20:00:28.748-0300: 8172.345: [GC 3521702K->1889189K(3914560K), 0.0379160 secs]
2013-03-03T20:00:30.404-0300: 8174.001: [GC 3537445K->1887193K(3914560K), 0.0407670 secs]
2013-03-03T20:00:32.713-0300: 8176.309: [GC 3535449K->1892863K(3914560K), 0.0366880 secs]
2013-03-03T20:00:34.791-0300: 8178.388: [GC 3541119K->1899095K(3914560K), 0.0398270 secs]
2013-03-03T20:00:36.533-0300: 8180.129: [GC 3547351K->1910071K(3914560K), 0.0373960 secs]
2013-03-03T20:00:39.037-0300: 8182.634: [GC 3558327K->1904198K(3914560K), 0.0393020 secs]
2013-03-03T20:00:41.548-0300: 8185.144: [GC 3552454K->1912352K(3914560K), 0.0444060 secs]
2013-03-03T20:00:43.771-0300: 8187.368: [GC 3560608K->1919304K(3914560K), 0.0427220 secs]
2013-03-03T20:00:47.411-0300: 8191.008: [GC 3566354K->1918102K(3914560K), 0.0418150 secs]
2013-03-03T20:00:50.925-0300: 8194.522: [GC 3564290K->1930888K(3914560K), 0.0414700 secs]
2013-03-03T20:00:52.991-0300: 8196.588: [GC 3579144K->1933251K(3914560K), 0.0349600 secs]
2013-03-03T20:00:53.027-0300: 8196.624: [GC 1939697K(3914560K), 0.0256300 secs]
2013-03-03T20:00:54.208-0300: 8197.804: [GC 2780505K(3914560K), 0.1424860 secs]
2013-03-03T20:00:55.684-0300: 8199.281: [GC 3029503K->1389766K(3914560K), 0.0370380 secs]
2013-03-03T20:00:58.289-0300: 8201.886: [GC 2213458K->570843K(3914560K), 0.0413220 secs]
2013-03-03T20:01:00.672-0300: 8204.268: [GC 1962741K->319619K(3914560K), 0.0410840 secs]
2013-03-03T20:01:02.906-0300: 8206.503: [GC 1966833K->319605K(3914560K), 0.0453730 secs]
2013-03-03T20:01:06.861-0300: 8210.458: [GC 1967861K->330864K(3914560K), 0.0425570 secs]
2013-03-03T20:01:10.067-0300: 8213.664: [GC 1979120K->336541K(3914560K), 0.0479380 secs]
2013-03-03T20:01:12.587-0300: 8216.184: [GC 1984797K->343203K(3914560K), 0.0376810 secs]

也只有 2 个条目(大约 1 天)大于 1 秒:grep -oP ", [1-9]..*?secs]$"/shop/logs/live/solr/gc.log, 1.1727270 秒], 1.0390840 秒]

Also there are just 2 entries at all (about 1 day) greater than 1 second: grep -oP ", [1-9]..*?secs]$" /shop/logs/live/solr/gc.log , 1.1727270 secs] , 1.0390840 secs]

有人对 solr/jvm 有任何想法或已经有这种现象吗?

Someone any idea or already had this phenomenon with solr/jvm?

推荐答案

不要相信您的 GC 日志,除非您在选项中包含 -XX:+PrintGCApplicationStoppedTime.即使到那时也怀疑他们.除非您包含此标志,否则有些停顿和部分停顿可能会很长并且不会被报告.例如.我见过偶尔长时间运行的计数循环需要 15 秒才能达到安全点而导致的暂停,在这种情况下,GC 仅报告了 0.08 秒的暂停部分,它实际上做了一些工作.还有很多暂停的原因不被认为是GC"的一部分,因此可以不被 GC 日志标记报告.

Don't believe your GC logs unless you include -XX:+PrintGCApplicationStoppedTime in options. Suspect them even then. There are pauses and parts of pauses that can be very long and go unreported unless you include this flag. E.g. I've seen pauses caused by the occasional long running counted loop taking 15 seconds to reach a safe point, where GC only reported only the .08 seconds part of the pause where it actually did some work. There are also plenty of pauses whose causes that are not considered part of "GC" and can thereby go unreported by GC logging flags.

您可以尝试添加 jHiccup 作为代理来报告观察到的暂停/故障/停顿/打嗝,而不是依赖 JVM 日志的真实性.如果它显示多秒故障,那么您就会知道您的 JVM 正在暂停.如果它显示 JVM 运行顺畅,那么您就知道查看您的其他配置部分.

You can try adding jHiccup as an agent to report on observed pause/glitch/stall/hiccups rather than rely in the honesty of the JVM logs. If it shows multi-sec glitches then you'll know your JVM is pausing. If it shows smooth JVM operation, then you know to look at your other configuration parts.

这篇关于Solr (JVM) 每小时达到峰值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆