Lucens最好的“开始"方式询问 [英] Lucens best way to do "starts-with" queries

查看：93 发布时间：2020/5/4 7:40:55 lucene startswith

本文介绍了Lucens最好的“开始"方式询问的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望能够进行以下类型的查询:

I want to be able to do the following types of queries:

要索引的数据由(例如)音乐视频组成，其中只有标题很有趣. 我只是想对它们建立索引，然后为它们创建查询，这样，无论用户在查询中使用的是哪个单词，包含这些单词的文档(在图块的开头按该顺序)将首先返回，然后返回(在标题中任何位置包含至少一个搜索到的单词的文档).同样，所有这些都应该不区分大小写.

The data to index consists of (let's say), music videos where only the title is interesting. I simply want to index these and then create queries for them such that, whatever word or words the user used in the query, the documents containing those words, in that order, at the beginning of the tile will be returned first, followed (in no particular order) by documents containing at least one of the searched words in any position of the title. Also all this should be case insensitive.

示例:

对于文档:

Video1Title =大海是蓝色的
Video2Title =野海
Video3Title =狂野的大海
Video4Title =沿海地区

如果我搜索海"，我想得到

If I search "sea" I want to get

"Video1Title =大海是蓝色的"

首先是所有其他标题中包含"sea"的文档，但开头不是.

first followed by all the other documents that contain "sea" in title, but not at the beginning.

如果我想搜索狂野海域"

If I search "Wild sea" I want to get

Video2Title =野海
Video3Title =狂野的大海

首先是标题为"Wild"或"Sea"但标题前缀为"Wild Sea"的所有其他文档.

first followed by all the other documents that have "Wild" or "Sea" in their title but don't have "Wild Sea" as title prefix.

如果我搜索"Seasi"，我什么也不想得到(我不在乎关键字标记和前缀查询).

If I search "Seasi" I don't wanna get anything (I don't care for Keyword Tokenization and prefix queries).

现在AFAIKS，没有实际的方法告诉Lucene找到文件word1和word2等位于位置1、2和3等位置的文档"

Now AFAIKS, there's no actual way to tell Lucene "find me documents where word1 and word2 and etc. are in positions 1 and 2 and 3 and etc."

有一些变通办法"可以模拟这种行为:

There are "workarounds" to simulate that behaviour:

对该字段编制两次索引.在field1中，您有单词标记化的单词(可能使用StandardAnalyzer)，在field2中，您将它们全部聚集成一个元素(使用KeywordAnalyzer).然后，如果您搜索类似的内容:

Index the field twice. In field1 you have the words tokenized (using perhaps StandardAnalyzer) and in field2 you have them all clumped up into one element (using KeywordAnalyzer). Then if you search something like :

+(field1:word1 word2 word3)(field2:"word1 word2 word3 *")

+(field1:word1 word2 word3) (field2:"word1 word2 word3*")

有效地告诉Lucene文档的标题中必须包含word1，word2或word3，而且与"title开头> word1 word2 word3<"相匹配的文档更好(获得更高的分数).

effectively telling Lucene "Documents must contain word1 or word2 or word3 in the title, and furthermore those that match "title starts with >word1 word2 word3<" are better (get higher score).

为索引建立索引时，请在字段的开头添加"lucene_start_token" Video2Title = Wild sea的索引索引为"title:lucene_start_token Wild sea"，其余的索引依此类推

Add a "lucene_start_token" to the beginning of the field when indexing them such that Video2Title = Wild sea is indexed as "title:lucene_start_token Wild sea" and so on for the rest

然后执行以下查询:

+(title:sea)(标题:"lucene_start_token sea")

+(title:sea) (title:"lucene_start_token sea")

让Lucene返回标题中包含我的搜索词的所有文档，并且在匹配"lucene_start_token +搜索词"的文档中给出更高的分数

and having Lucene return all documents which contain my search word(s) in the title and also give a better score on those who matched "lucene_start_token+search words"

然后我的问题是，是否确实有更好的方法(也许使用

My question is then, are there indeed better ways to do this (maybe using PhraseQuery and Term position)? If not, which of the above is better perfromance-wise?

Lucens最好的“开始"方式询问 [英] Lucens best way to do "starts-with" queries

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lucens最好的“开始"方式询问 [英] Lucens best way to do &quot;starts-with&quot; queries

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Lucens最好的“开始"方式询问 [英] Lucens best way to do "starts-with" queries

登录关闭