Solr通配符和转义字符一起 [英] Solr wildcards and escaped characters together

查看:74
本文介绍了Solr通配符和转义字符一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在solr中搜索,但是有问题.例如,我有以下这种情况,存储在solr中: [Karina K [arina?!&?!a&m.malina m:malina 0sal0 0 AND .现在我想用通配符 * 搜索任何请求.例如,我写 * [* *?* ,然后solr给我发回这封信.但这是行不通的.我试过的:

I am trying to search in solr but have a problem. For example i have this fraze, stored in solr: [Karina K[arina ? ! & ?!a& m.malina m:malina 0sal0 0 AND. Now i want to search any request with wildcards *. For example i write *[* or *?* and solr return me this fraze. But it doesn't work. What i tried:

  1. 我可以使用像 K \ [arina 这样的转义字符,但是在这种情况下,我需要输入所有短语在此处输入图片描述
  2. 最后, \ * \ [arin \ * 不起作用.为什么?逻辑在哪里?在此处输入图片描述
  3. 我读过的地方可以使用"例如" * \ [arin *"甚至是 * [arin * 不是在此处输入图片描述
  4. 有趣的是,该 K \ [arina 就像我可以搜索的整个单词一样,或者 \?\!a \& ,但是 \?我不能.
  1. i can use escaped characters like this K\[arina, but in this case i need to enter all phrase enter image description here
  2. But if i write K\[arin*, i wioll have no results enter image description here
  3. Okey, i tried K\[arin\*, and it is worked enter image description here
  4. Okey, then i put * at start \*\[arina and it is ok enter image description here
  5. And finally \*\[arin\* doesnt work. Why? Where the logik? enter image description here
  6. Somewhere i read, that i can use " for example "*\[arin*" or even *[arin*, but not enter image description here
  7. And interesting, that K\[arina like the whole word i can search, or \?\!a\&, but \? i can not.

推荐答案

搜索通配符时,除非已配置的过滤器为MultiTermAware,否则不会调用常规分析链.这意味着您将在不知道幕后发生情况的情况下切换搜索行为.

When searching wildcards the regular analysis chain will not be invoked, unless the filter configured is MultiTermAware. That means that you're switching the search behavior around without knowing what's happening behind the scenes.

Lucene和Solr对令牌进行操作-令牌通常是输入字符流中的单个单词(经过一些处理),根据字段的令牌化程序拆分(令牌化")不同的字符.令牌生成器通常会拆分大多数非字母数字字符,并且显式定义分析链将使您能够获得所需的行为.

Lucene and Solr operates on tokens - tokens are usually single words (after some processing) from the input character stream, split ("tokenized") on different characters depending on what the tokenizer for the field is. A tokenizer will usually split on most non-alphanumeric characters, and defining the analysis chain explicitly will allow you to get the behavior you're looking for.

我猜想您的令牌生成器会拆分字符串中的大多数特殊字符,从而有效地将 K [arina "索引为 K arina .

I'm guessing your tokenizer splits on most of the special characters you have in your string, so that K[arina effectively ends up being indexed as K and arina.

K\\[arina => K, arina (split on \ and [)

没有令牌匹配:

K\[arin* => nothing happens, since there is no token starting with K[arin

转义通配符意味着整个字符串被发送到令牌生成器,实际上不是使其成为通配符搜索,而是使用包含 * 的字符串进行搜索:

Escaping the wildcard means that the whole string gets sent to the tokenizer, effectively not making it a wildcard search, but a search with a string containing * instead:

K\[arin\* => K, arin -> K matches (and arin if an ngram filter is attached)
(one of your later examples show that there is no ngram filter)

这里的行为相同,转义星号意味着整个字符串将发送到令牌生成器,而不是发生通配符搜索:

Same behavior here, escaping the asterisk means that the whole string gets sent to the tokenizer instead of a wildcard search happening:

\*\[arina => arina -> arina matches

当没有令牌匹配时:

\*\[arin\* => arin -> there is no token matching arin, only arina.

第6种情况适用于短语,这是在单个匹配项之间搜索带有空格的标记.我现在暂时跳过.

Case 6 is meant for phrases, which is tokens with whitespace between them being searched as a single match. I'll skip that for now.

最后一种情况实际上是以空搜索结束,因为令牌生成器将在?上拆分,并且不保留任何可用的令牌.该行上的第一个示例保留了预期的标记 K arina :

The last case is effectively ending up with an empty search, since the tokenizer will split on ? and leaving no usable tokens. Your first example on that line leaves the expected tokens, K and arina:

K\[arina => K, arina
\?\!a\& => a
\? => <nothing>

这篇关于Solr通配符和转义字符一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆