在Solr/Lucene性能问题中按日期排序 [英] Sort by date in Solr/Lucene performance problems

查看:118
本文介绍了在Solr/Lucene性能问题中按日期排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已经建立了一个Solr索引,其中包含3600万个文档(每个〜1K-2K),并且我们尝试查询最多100个与单个简单关键字匹配的文档.正如我们所希望的那样,它的运行速度非常快. 但是,如果现在向查询添加& sort = createDate + desc"(因此要求匹配查询的前100个新"文档),它将运行很长时间,并最终导致OutOfMemoryException. 据我从手册中了解到的,这是由于Lucene需要在执行查询之前将该字段(createDate)的所有不同值加载到内存(FieldCache afaik)中.由于createDate字段包含日期和时间,因此不同值的数量非常大. 值得一提的是,我们经常更新索引.

We have set up an Solr index containing 36 million documents (~1K-2K each) and we try to query a maximum of 100 documents matching a single simple keyword. This works pretty fast as we had hoped for. However, if we now add "&sort=createDate+desc" to the query (thus asking for the top 100 'new' documents matching the query) it runs for a long, very long time and finally results in an OutOfMemoryException. From what I've understood from the manual this is caused by the fact that Lucene needs to load all the distinct values for this field (createDate) into memory (the FieldCache afaik) before it can execute the query. As the createDate field contains date and time the number of distinct values is pretty large. Also important to mention is that we frequently update the index.

也许有人可以提供一些见解和指导,说明我们如何调整Lucene/Solr或更改我们的方法,以使查询时间可以接受? 您的输入将不胜感激!谢谢.

Perhaps someone can provide some insights and directions on how we can tune Lucene / Solr or change our approach in such a way that query times become acceptable? Your input will be much appreciated! Thanks.

推荐答案

问题是Lucene将数字存储为字符串.有一些实用程序,可将日期分为YYYY,MM和DD并将它们放在不同的字段中.这样会带来更好的结果.

The problem is Lucene stores numbers as strings. There are some utilities, which split the date into YYYY, MM, DD and put them in different fields. That gives much better results.

较新版本的Lucene(2.9及更高版本)支持数字字段,并且性能显着提高(几个数量级,IIRC).请检查

Newer version of Lucene (2.9 onwards) support numeric fields and the performance improvements are significant (couple of orders of magnitude, IIRC.) Check this article about the numeric queries.

这篇关于在Solr/Lucene性能问题中按日期排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆