Lucene和特殊字符 [英] Lucene and Special Characters

查看:226
本文介绍了Lucene和特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Lucene.Net 2.0索引某些字段从数据库表。其中一个领域是一个名称字段,允许特殊字符。当我执行搜索,没有找到我的文档,其中包含有特殊字符的一个术语

I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters.

我的索引我的领域这样:

I index my field as such:

Directory DALDirectory = FSDirectory.GetDirectory(@"C:\Indexes\Name", false);
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(DALDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

Document doc = new Document();
doc.Add(new Field("Name", "Test (Test)", Field.Store.YES, Field.Index.TOKENIZED));
indexWriter.AddDocument(doc);

indexWriter.Optimize();
indexWriter.Close();

和我寻找执行以下操作:

And I search doing the following:

value = value.Trim().ToLower();
value = QueryParser.Escape(value);

Query searchQuery = new TermQuery(new Term(field, value));
Searcher searcher = new IndexSearcher(DALDirectory);

TopDocCollector collector = new TopDocCollector(searcher.MaxDoc());
searcher.Search(searchQuery, collector);
ScoreDoc[] hits = collector.TopDocs().scoreDocs;

如果我演出现场的名称和值测试进行搜索,找到的文件。如果我执行相同的搜索为名称和值测试(测试),那么它​​没有找到该文件。

If I perform a search for field as 'Name' and value as 'Test', it finds the document. If I perform the same search as 'Name' and value as 'Test (Test)', then it does not find the document.

更​​奇怪了,如果我删除在QueryParser.Escape线做一个GUID(其中,当然,包括连字符)搜索找到的文件,其中GUID值相匹配,但执行与(测试)测试仍然得到任何结果。值作为同样的搜索

Even more strange, if I remove the QueryParser.Escape line do a search for a GUID (which, of course, contains hyphens) it finds documents where the GUID value matches, but performing the same search with the value as 'Test (Test)' still yields no results.

我不确定什么是我做错了。我使用的方法QueryParser.Escape逃避特殊字符和我存储领域,由Lucene.Net的例子搜索。

I am unsure what I am doing wrong. I am using the QueryParser.Escape method to escape the special characters and am storing the field and searching by the Lucene.Net's examples.

有什么想法?

推荐答案

StandardAnalyzer索引中剔除特殊字符。你可以明确的禁用词(不包括要在的)名单通过。

StandardAnalyzer strips out the special characters during indexing. You can pass in a list of explicit stopwords (excluding the ones you want in).

这篇关于Lucene和特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆