Solr/Lucene fieldCache OutOfMemory 错误排序动态字段 [英] Solr/Lucene fieldCache OutOfMemory error sorting on dynamic field

查看:22
本文介绍了Solr/Lucene fieldCache OutOfMemory 错误排序动态字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个 Solr 核心,它有大约 250 个 TrieIntField(声明为 dynamicField).我们的 Solr 索引中有大约 1400 万个文档,许多文档在其中许多领域都具有一定的价值.我们需要在一段时间内对所有这 250 个字段进行排序.

We have a Solr core that has about 250 TrieIntFields (declared as dynamicField). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.

我们面临的问题是底层的 lucene fieldCache 很快就被填满了.我们有一个 4 GB 的盒子,索引大小为 18 GB.在对这些动态字段中的 40 或 45 个进行排序后,内存消耗约为 90%,并且我们开始出现 OutOfMemory 错误.

The issue we are facing is that the underlying lucene fieldCache gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.

目前,如果消耗的总内存超过 80%,我们每分钟都会运行一个 cron 作业来重新启动 tomcat.

For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.

根据我的阅读,我了解到限制可排序 Solr 字段上不同值的数量会降低 fieldCache 空间.这些可排序字段中的值可以是 0 到 33000 之间的任何整数,并且分布非常广泛.我们想到了一些扩展解决方案,但是处理整个问题的最佳方法是什么?

From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?

更新:我们认为不是排序,如果我们确实提升它不会去 fieldCache.所以不要发出像

UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like

select?q=name:alba&sort=relevance_11 desc

我们试过了

select?q={!boost correlation_11}name:alba

但不幸的是,boosting 也会填充字段缓存:(

but unfortunately boosting also populates the field cache :(

推荐答案

我认为你有两个选择:

1) 添加更多内存.
2) 通过指定 facet.method=enum, 根据文档.

1) Add more memory.
2) Force Solr not to use the field cache by specifying facet.method=enum, as per documentation.

还有一个 solr-user邮件列表线程讨论同样的问题.

There's also a solr-user mailing list thread discussing the same problem.

除非您的索引很大,否则我会选择选项 1).现在 RAM 很便宜.

Unless your index is huge, I'd go with option 1). RAM is cheap these days.

这篇关于Solr/Lucene fieldCache OutOfMemory 错误排序动态字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆