Solr/Lucene fieldCache对动态字段的内存不足错误排序 [英] Solr/Lucene fieldCache OutOfMemory error sorting on dynamic field
问题描述
我们有一个Solr内核,大约有250个TrieIntField
s(声明为dynamicField
).我们的Solr索引中大约有1400万个文档,许多文档在许多这些领域中都具有一定的价值.我们需要在一段时间内对所有这250个字段进行排序.
We have a Solr core that has about 250 TrieIntField
s (declared as dynamicField
). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.
我们面临的问题是基础lucene fieldCache
很快被填充.我们有一个4 GB的框,索引大小为18 GB.在对这些动态字段中的40或45个进行排序之后,内存消耗约为90%,我们开始收到OutOfMemory错误.
The issue we are facing is that the underlying lucene fieldCache
gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.
就目前而言,如果消耗的总内存超过80%,我们将每分钟运行一次cron作业,以重新启动tomcat.
For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.
据我了解,我理解限制可排序Solr字段上不同值的数量将减少fieldCache
空间.这些可排序字段中的值可以是0到33000之间的任何整数,并且分布范围很广.我们正在考虑一些扩展解决方案,但是处理整个问题的最佳方法是什么?
From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache
space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?
更新:我们认为不是排序,如果我们进行了增强,它将不会进入fieldCache.因此,与其发出类似
UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like
select?q=name:alba&sort=relevance_11 desc
我们尝试了
select?q={!boost relevance_11}name:alba
但不幸的是,boost也会填充字段缓存:(
but unfortunately boosting also populates the field cache :(
推荐答案
我认为您有两个选择:
1)添加更多内存.
2)通过根据文档指定facet.method=enum
,,强制Solr不使用字段缓存.
1) Add more memory.
2) Force Solr not to use the field cache by specifying facet.method=enum
, as per documentation.
还有一个 solr用户邮件列表线程讨论了同样的问题.
There's also a solr-user mailing list thread discussing the same problem.
除非您的索引很大,否则我将选择选项1).这些天RAM很便宜.
Unless your index is huge, I'd go with option 1). RAM is cheap these days.
这篇关于Solr/Lucene fieldCache对动态字段的内存不足错误排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!