Lucene.Net模糊搜索速度 [英] Lucene.Net fuzzy search speed

查看:164
本文介绍了Lucene.Net模糊搜索速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉的关注,但我希望得到的Lucene经验的人的帮助。

Sorry for the concern, but I hope to get any help from Lucene-experienced people.

现在,我们在我们的应用程序Lucene.Net 3.0.3使用索引由〜250万项搜索。
每个实体都包含27搜索领域,加入到索引是这样的:新的字段(键,值,Field.Store.YES,Field.Index.ANALYZED))

Now we use in our application Lucene.Net 3.0.3 to index and search by ~2.500.000 items. Each entity contains 27 searchable field, which added to index in this way: new Field(key, value, Field.Store.YES, Field.Index.ANALYZED))

现在我们有两个搜索选项:

Now we have two search options:


  1. 搜索只能通过模糊搜索

  2. 4场
  3. 通过使用精确搜索

我们有每周约53000人,例如自动搜索搜索服务4-27领域搜索鲍勃·休斯顿,萨拉康纳,舒扬香港的Uin豪等。

We have a search service that every week automatically searches by about 53000 people such "Bob Huston", "Sara Conor", "Sujan Hong Uin Ho", etc.

所以,我们经历选项慢的搜索速度 1 S IN searcher.Search平均4-8秒,它是我们的主要问题。

So we experience slow search speed in option 1, its an average 4-8 sec in searcher.Search and its our major problem.

搜索示例代码

                var index = FSDirectory.Open(indexPath);
                var searcher = new IndexSearcher(index, true);
                this.analyzer = new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>())
                var queryParser = new MultiFieldQueryParser(Version.LUCENE_30, queryFields, this.analyzer);
                queryParser.AllowLeadingWildcard = false;
                Query query;
                query = queryParser.Parse(token);
                var results = searcher.Search(query, NumberOfResults);// NumberOfResults==500

我们的模糊搜索查询,发现鲍勃丛红,在4个领域:

Our fuzzy search query to find "bob cong hong" in 4 fields:

(((PersonFirstName:鲍勃〜0.6)OR(PersonLastName:鲍勃〜0.6)OR( PersonAlias​​es:鲍勃〜0.6)OR(PersonAlternativeSpellings:鲍勃〜0.6))AND((PersonFirstName:聪〜0.6)OR(PersonLastName:聪〜0.6)OR(PersonAlias​​es:聪〜0.6)OR(PersonAlternativeSpellings:聪〜0.6))AND ((PersonFirstName:红〜0.6)OR(PersonLastName:红〜0.6)OR(PersonAlias​​es:红〜0.6)OR(PersonAlternativeSpellings:红〜0.6)))

(((PersonFirstName:bob~0.6) OR (PersonLastName:bob~0.6) OR (PersonAliases:bob~0.6) OR (PersonAlternativeSpellings:bob~0.6)) AND ((PersonFirstName:cong~0.6) OR (PersonLastName:cong~0.6) OR (PersonAliases:cong~0.6) OR (PersonAlternativeSpellings:cong~0.6)) AND ((PersonFirstName:hong~0.6) OR (PersonLastName:hong~0.6) OR (PersonAliases:hong~0.6) OR (PersonAlternativeSpellings:hong~0.6)))

目前改进:


  1. 我们结合这4个领域,以1搜索字段

  2. 我们决定使用单IndexSearcher的在服务,而不是在每一个搜索请求打开

  3. 合并因子= 2

改进总组合生产约 30%-40%的速度增加

在此之后的文章 we`ve做出最可能的优化:

Following this article we`ve made most of possible optimizations:

  • Index is placed on SAS drive which is quite fast: http://accessories.euro.dell.com/sna/productdetail.aspx?c=ie&l=en&s=dhs&cs=iedhs1&sku=400-AHWT#Overview
  • We have enough RAM memory
  • MergeFactor 2
  • Tried to move index to RAMDirectory, but test results aren`t stable, sometimes speed is the same

你有其他建议如何提高搜索速度在我们的情况如何?

Do you have other suggestions how to improve search speed in our situation?

感谢您。

推荐答案

您可以提高速度的模糊查询可以通过前缀长度设置为非零值。这将允许lucene的有效缩小集合的可能结果。像这样的:

You can improve the speed of Fuzzy Queries by setting their prefix length to a non-zero value. This will allow lucene to narrow the set of possible results efficiently. Like this:

queryParser.FuzzyPrefixLength = 2;



此外,它不会影响您所提供的操作为例进行查询,但如果你关心在所有关于性能,您应该删除行 queryParser.AllowLeadingWildcard = FALSE; 。领先的通配符绝对会杀了性能。

Also, it doesn't affect the query you've provided as an example, but if you care at all about performance, you should remove the line queryParser.AllowLeadingWildcard = false;. Leading wildcards will absolutely kill performance.

这篇关于Lucene.Net模糊搜索速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆