Solr/Lucene fieldCache对动态字段的内存不足错误排序 [英] Solr/Lucene fieldCache OutOfMemory error sorting on dynamic field

查看:266
本文介绍了Solr/Lucene fieldCache对动态字段的内存不足错误排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个Solr内核,大约有250个TrieIntField s(声明为dynamicField).我们的Solr索引中大约有1400万个文档,许多文档在许多这些领域中都具有一定的价值.我们需要在一段时间内对所有这250个字段进行排序.

We have a Solr core that has about 250 TrieIntFields (declared as dynamicField). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.

我们面临的问题是基础lucene fieldCache很快被填充.我们有一个4 GB的框,索引大小为18 GB.在对这些动态字段中的40或45个进行排序之后,内存消耗约为90%,我们开始收到OutOfMemory错误.

The issue we are facing is that the underlying lucene fieldCache gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.

就目前而言,如果消耗的总内存超过80%,我们将每分钟运行一次cron作业,以重新启动tomcat.

For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.

据我了解,我理解限制可排序Solr字段上不同值的数量将减少fieldCache空间.这些可排序字段中的值可以是0到33000之间的任何整数,并且分布范围很广.我们正在考虑一些扩展解决方案,但是处理整个问题的最佳方法是什么?

From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?

更新:我们认为不是排序,如果我们进行了增强,它将不会进入fieldCache.因此,与其发出类似

UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like

select?q=name:alba&sort=relevance_11 desc

我们尝试了

select?q={!boost relevance_11}name:alba

但不幸的是,boost也会填充字段缓存:(

but unfortunately boosting also populates the field cache :(

推荐答案

我认为您有两个选择:

1)添加更多内存.
2)通过根据文档指定facet.method=enum,强制Solr不使用字段缓存.

1) Add more memory.
2) Force Solr not to use the field cache by specifying facet.method=enum, as per documentation.

还有一个 solr用户邮件列表线程讨论了同样的问题.

There's also a solr-user mailing list thread discussing the same problem.

除非您的索引很大,否则我将选择选项1).这些天RAM很便宜.

Unless your index is huge, I'd go with option 1). RAM is cheap these days.

这篇关于Solr/Lucene fieldCache对动态字段的内存不足错误排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆