查询lucene索引的电子邮件地址 [英] Querying email addresses indexed by lucene

查看:200
本文介绍了查询lucene索引的电子邮件地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我要检索dave@gmail.com,搜索dave将像dave@gmail.com那样工作。

If I'm trying to retrieve dave@gmail.com, searching "dave" will work as will "dave@gmail.com".

但搜索dave @ gmail将无法正常工作。查询发生在Java servlet中。
我相信问题可能在于完全停止分裂。

But searching for "dave@gmail" won't work. The query takes place inside a Java servlet. I believe that the problem may lie with the full stop splitting

我如何解决这个问题,以便dave @ gmail将返回dave @ gmail。 COM?电子邮件地址还可能包含其他域名(如.co.uk)

How can I fix this so that "dave@gmail" will return "dave@gmail.com"? Email addresses may also contain other domains (like .co.uk)

谢谢

推荐答案

Lucene使用Analyzers对您的文档进行标记和索引。同样,分析仪用于对用户搜索查询进行标记。

Lucene uses 'Analysers' to tokenise and index your documents. Likewise, analysers are used to tokenise the user search query.

一个常见的错误是使用不同的分析器进行索引而不是搜索,两者都必须匹配您才能获得您期望的结果(搜索此文档常见错误) 。

A common mistake is to use a different analyser for indexing than for searching, both must match for you to get the results you expect (search this doc for "common mistake").

标准lucene tokeniser 识别电子邮件字符串并将其作为一个令牌进行索引。

The standard lucene tokeniser recognses email strings and indexes them as one token.

它将索引dave@gmail.com作为[token:dave@gmail.com]。但是,您正在使用的分析器可能会对查询进行令牌化(或者手动构建查询),将其分解为3个令牌,以非字母数字字符分割。所以你可能会搜索3个相邻的令牌:[tok1:dave] [tok2:gmail] [tok3:com],它们不存在。

It will index dave@gmail.com as [token:dave@gmail.com]. However, it's possible that the analyser you are using to tokenise your query (or if you are constructing the query manually) is breaking it up into 3 tokens, splitting at the non alpha-numeric characters. So you might be searching for 3 adjacent tokens: [tok1:dave] [tok2:gmail] [tok3:com], which don't exist.

Query.toString 可能会漂亮打印您提交给Lucene的查询,这可能有助于您调试。

Query.toString will probably "pretty print" the Query you are submitting to Lucene which may help you debug.

这篇关于查询lucene索引的电子邮件地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆