Solr搜索字段最佳实践 [英] Solr Search Field Best Practices

查看:90
本文介绍了Solr搜索字段最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将solr用于企业应用程序。到目前为止,它运行良好,因为我正在使用ngram字段进行搜索。对于部分查询(与索引ngram匹配),它可以正常工作。但是我的问题是,如何强制执行精确的查询匹配?例如,查询测试1应与用户用双引号输入的文本完全匹配。当前,由于我使用了一些标记器和过滤器,双引号被过滤掉了,查询 test 1, tEst 1或 TEST 1 (这是因为我使用了分析器链,但是使用ngram和部分搜索是必需的)。

I'm using solr for an enterprise application. So far it works well, as I am using a ngram field to search against. It works correctly for partial queries (match against indexed ngrams). But the problem I have is, how to enforce exact query matches?. For an example the query "Test 1" should match exactly the same text as it is when the user enter it with double quotation marks. Currently Since I have used some tokenizers and filters, the double quotation marks get filtered out, there's no difference in the queries "test 1", "tEst 1" or "TEST 1" (that is because of the analyzer chain I use, but it is needed to work with ngrams and partial search).

当前我正在搜索一个ngram查询字段。为了强制执行精确的查询匹配,我该怎么办?最佳做法是什么?目前,我认为是要从客户端识别双引号并将查询字段更改为原始字段(不带ngram)。但是我觉得应该有一个更好的方法,因为我遇到的问题是通用的, solr 是一个完整的企业级搜索引擎。

Currently I'm searching against a ngram query field. In order to enforce exact query match, what should I do? what is the best practice?. currently what I think is to identify the double quotation marks from client side and change the query field to the original field (with out ngrams). But I feel like there should be a better way of doing this, since the problem I have is generic and solr is a complete enterprise level search engine.

推荐答案

您可以为其添加另一个字段并添加 string 作为 fieldType 并使用相同的索引。

You can have another field for it and add string as the fieldType for the same and index it with same.

要执行完全匹配时,可以在上面的字段中查询。

When you want to perform the exact match you can query on the above field.

如果要执行部分搜索,则可以查询到较早的字段

And when you want to perform partial search ..you can query to the earlier field which is indexed by ngram.

OR ..这是另一种尝试的方法。

OR.. Here is another way you can try.

您已经定义了当前字段类型使用ngram。在这种情况下,您可以在建立索引时定义ngram标记器,并为查询只提及keywordTokenizer和小写过滤器工厂。

You have defined the current field type using the ngram. In that while indexing you can define the ngram tokenizer and for the query you mention keywordTokenizer and lowercase filter factory only.

在建立索引时,文本将被标记化,并在执行查询时不会。

While indexing the text will be tokenized and while performing the query it will not.

这篇关于Solr搜索字段最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆