Lucene.Net模糊搜索速度 [英] Lucene.Net fuzzy search speed
问题描述
很抱歉的关注,但我希望得到的Lucene经验的人的帮助。
Sorry for the concern, but I hope to get any help from Lucene-experienced people.
现在,我们在我们的应用程序Lucene.Net 3.0.3使用索引由〜250万项搜索。
每个实体都包含27搜索领域,加入到索引是这样的:新的字段(键,值,Field.Store.YES,Field.Index.ANALYZED))
Now we use in our application Lucene.Net 3.0.3 to index and search by ~2.500.000 items. Each entity contains 27 searchable field, which added to index in this way: new Field(key, value, Field.Store.YES, Field.Index.ANALYZED))
现在我们有两个搜索选项:
Now we have two search options:
- 搜索只能通过模糊搜索
- 通过使用精确搜索
4场
我们有每周约53000人,例如自动搜索搜索服务4-27领域搜索鲍勃·休斯顿,萨拉康纳,舒扬香港的Uin豪等。
We have a search service that every week automatically searches by about 53000 people such "Bob Huston", "Sara Conor", "Sujan Hong Uin Ho", etc.
所以,我们经历选项慢的搜索速度 1 它 S IN searcher.Search平均4-8秒,它
是我们的主要问题。
So we experience slow search speed in option 1, its an average 4-8 sec in searcher.Search and it
s our major problem.
搜索示例代码
var index = FSDirectory.Open(indexPath);
var searcher = new IndexSearcher(index, true);
this.analyzer = new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>())
var queryParser = new MultiFieldQueryParser(Version.LUCENE_30, queryFields, this.analyzer);
queryParser.AllowLeadingWildcard = false;
Query query;
query = queryParser.Parse(token);
var results = searcher.Search(query, NumberOfResults);// NumberOfResults==500
我们的模糊搜索查询,发现鲍勃丛红,在4个领域:
Our fuzzy search query to find "bob cong hong" in 4 fields:
(((PersonFirstName:鲍勃〜0.6)OR(PersonLastName:鲍勃〜0.6)OR( PersonAliases:鲍勃〜0.6)OR(PersonAlternativeSpellings:鲍勃〜0.6))AND((PersonFirstName:聪〜0.6)OR(PersonLastName:聪〜0.6)OR(PersonAliases:聪〜0.6)OR(PersonAlternativeSpellings:聪〜0.6))AND ((PersonFirstName:红〜0.6)OR(PersonLastName:红〜0.6)OR(PersonAliases:红〜0.6)OR(PersonAlternativeSpellings:红〜0.6)))
(((PersonFirstName:bob~0.6) OR (PersonLastName:bob~0.6) OR (PersonAliases:bob~0.6) OR (PersonAlternativeSpellings:bob~0.6)) AND ((PersonFirstName:cong~0.6) OR (PersonLastName:cong~0.6) OR (PersonAliases:cong~0.6) OR (PersonAlternativeSpellings:cong~0.6)) AND ((PersonFirstName:hong~0.6) OR (PersonLastName:hong~0.6) OR (PersonAliases:hong~0.6) OR (PersonAlternativeSpellings:hong~0.6)))
目前改进:
- 我们结合这4个领域,以1搜索字段
- 我们决定使用单IndexSearcher的在服务,而不是在每一个搜索请求打开
- 合并因子= 2
改进总组合生产约 30%-40%的速度增加
在此之后的文章 we`ve做出最可能的优化:
Following this article we`ve made most of possible optimizations:
- 索引放在SAS驱动器是相当快:的 http://accessories.euro.dell.com/sna/productdetail.aspx?c=ie&l=en&s=dhs&cs=iedhs1&sku=400-AHWT#概述
- 我们有足够的RAM内存
- 合并因子2
- 试图移动索引RAMDirectory,但测试结果aren`t稳定,有时速度是一样的
- Index is placed on SAS drive which is quite fast: http://accessories.euro.dell.com/sna/productdetail.aspx?c=ie&l=en&s=dhs&cs=iedhs1&sku=400-AHWT#Overview
- We have enough RAM memory
- MergeFactor 2
- Tried to move index to RAMDirectory, but test results aren`t stable, sometimes speed is the same
你有其他建议如何提高搜索速度在我们的情况如何?
Do you have other suggestions how to improve search speed in our situation?
感谢您。
推荐答案
您可以提高速度的模糊查询可以通过前缀长度设置为非零值。这将允许lucene的有效缩小集合的可能结果。像这样的:
You can improve the speed of Fuzzy Queries by setting their prefix length to a non-zero value. This will allow lucene to narrow the set of possible results efficiently. Like this:
queryParser.FuzzyPrefixLength = 2;
此外,它不会影响您所提供的操作为例进行查询,但如果你关心在所有关于性能,您应该删除行 queryParser.AllowLeadingWildcard = FALSE;
。领先的通配符绝对会杀了性能。
Also, it doesn't affect the query you've provided as an example, but if you care at all about performance, you should remove the line queryParser.AllowLeadingWildcard = false;
. Leading wildcards will absolutely kill performance.
这篇关于Lucene.Net模糊搜索速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!