SOLR - 过滤器查询中的正则表达式 [英] SOLR - Regex in Filter query
问题描述
我想在 fq 中实现 Regex,但之前从未实现过.
I want to implement Regex in fq but never implemented it before.
我在属性中有以下值,字段类型为小写":Prop=company1@city1@state1@country1@资深分析化学家,芝加哥
I have the below value in a property and the fieldtype is "lowercase": Prop=company1@city1@state1@country1@senior analytical chemist, chicago
我想根据正则表达式过滤结果.正则表达式应与上述匹配,如果"company1@city1@state1@country1@"+ regex 匹配 chicago 和 analytical 最后一个@ 符号之后的任何位置.
I want to filter the results based on the regex. The regex should match the above if "company1@city1@state1@country1@"+ regex to match chicago and analytical anywhere after last @ symbol.
我的要求是匹配最后一个@ 之前的确切值,然后使用正则表达式匹配剩余的字符串,因为我只想在最后一部分进行自由文本搜索.我无法将数据拆分为多列作为多值字段.
My requirement is to match the exact values before last @ and then use regex to match the remaining strings as I want to do free text search only on the last part. I cant split the data into multiple columns as its a multi-valued field.
我在代码中尝试了下面的正则表达式来匹配最后一个@ 之后的字符串.它在代码中运行良好,但不确定如何在 SOLR 中实现相同
I tried the below regex in the code to match the string after last @. It works fine in the code but not sure how to implement same in SOLR
/([^@]+(?=.*IL)(?=.*chicago)(?=.*analytical))/ig
有人可以告诉我如何在 SOLR 中使用上述正则表达式吗?
Can someone please let me know how to use above regex with SOLR?
推荐答案
Solr 中的正则表达式是通过使用 q=field:/regex/
搜索提供的.这假设有问题的字段类型是字符串字段(或至少是一个带有 KeywordTokenizer 的字段),因为匹配发生在令牌级别(如果您有一个已分析的字段,它可能会被拆分为单独的令牌并且不会匹配正则表达式).
Regular expressions in Solr is provided by searching with q=field:/regex/
. This assumes that the field type in question is a string field (or at least a field with a KeywordTokenizer) as the matching happens on the token level (and if you have a analyzed field, it might be split into separate tokens and won't match the regex).
类似 q=field:/([^@]+(?=.*IL)(?=.*chicago)(?=.*analytical))/
可以工作,但是/i/
修饰符表示您不想关心大小写.我将使用带有 KeywordTokenizer 和 LowercaseFilter 的字段,然后使用小写正则表达式进行搜索:
Something like q=field:/([^@]+(?=.*IL)(?=.*chicago)(?=.*analytical))/
could work, but the /i/
modifier indicates that you don't want to care about casing. I'd use a field with a KeywordTokenizer and a LowercaseFilter, and then use a lowercase regex to search:
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
并查询:
q=field:/([^@]+(?=.*il)(?=.*chicago)(?=.*analytical))/
这篇关于SOLR - 过滤器查询中的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!