Solr edismax 支持哪些正则表达式功能? [英] What regular expression features are supported by Solr edismax?

查看:35
本文介绍了Solr edismax 支持哪些正则表达式功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正则表达式允许使用如下所示的模式匹配语法.我正在尝试实现一个功能强大的搜索工具,该工具可以实现尽可能多的这些功能.有人告诉我 edismax 是这项工作最灵活的工具.下面哪个模式匹配表达式可以用 edismax 完成?我能比 edismax 做得更好吗?您能否建议我可以使用哪些过滤器和解析器补丁来实现此功能?如果我认为 Solr 可以实现此类搜索的可接受性能(即服务器端处理时间),我是否在做梦?

Regular expressions allows for the pattern matching syntax shown below. I'm trying to implement a powerful search tool that implements as many of these as possible. I'm told that edismax is the most flexible tool for the job. Which of the pattern matching expressions below can be accomplished with edismax? Can I do better than edismax? Can you suggest which filters and parser patches I might use to work towards achieving this functionality? Am I dreaming if I think Solr can achieve acceptable performance (i.e. server-side processing time) of these kinds of searches?

正则表达式语法&来自 mysql

  1. ^ 匹配字符串的开头.'fofo' REGEXP '^fo' =>真的
  2. $ 匹配字符串的结尾.'fo\no' REGEXP '^fo\no$' =>真的
  3. * 0-无限通配符.'Baaaan' REGEXP 'Ba*n' =>真的
  4. ?0-1 通配符.'Baan' REGEXP '^Ba?n =>假'
  5. + 1-无限通配符.'Bn' REGEXP 'Ba+n' =>假
  6. |或者.'pi' REGEXP 'pi|apa' =>真的
  7. ()* 序列匹配.'pipi' REGEXP '^(pi)*$' =>真的
  8. [a-dX], [^a-dX] 字符范围/设置 'aXbc' REGEXP '[a-dXYZ]' =>真的
  9. {n} 或 {m,n} 基数符号 'abcde' REGEXP 'a[bcd]{3}e' =>真的
  10. [:character_class:] 'justalnums' REGEXP '[[:alnum:]]+' =>真的

推荐答案

Lucene 4.0 版将使用特殊语法直接在标准查询解析器中支持正则表达式查询.我证实它适用于我正在运行的 Solr 实例,该实例是在 2 月份从 subversion 主干构建的.

Version 4.0 of Lucene will support regex queries directly in the standard query parser using special syntax. I verified that it works on an instance of Solr I am running, built from the subversion trunk in February.

Jira ticket 2604 描述了使用特殊的标准查询解析器的扩展regex 语法,使用正斜杠来分隔正则表达式,类似于 Javascript 中的语法.它似乎正在使用底层 RegexpQuery 解析器.

Jira ticket 2604 describes the extension of the standard query parser using special regex syntax, using forward slashes to delimit the regex, similar to syntax in Javascript. It seems to be using the underlying RegexpQuery parser.

举个简单的例子:

body:/[0-9]{5}/

将匹配我已编入索引的文本语料库中的五位数邮政编码.但是,奇怪的是,body:/\d{5}/ 对我不起作用,而且 ^ 也失败了.

will match on a five-digit zip code in the textual corpus I have indexed. But, oddly, body:/\d{5}/ did not work for me, and ^ failed as well.

regex 方言必须是 Java 的,但我不确定其中的所有内容是否有效,因为我只是粗略地检查了一下.人们可能必须仔细查看 RegexpQuery 代码才能了解哪些有效,哪些无效.

The regex dialect would have to be Java's, but I'm not sure if everything in it works, since I have only done a cursory examination. One would probably have to look carefully at the RegexpQuery code to understand what works and what doesn't.

这篇关于Solr edismax 支持哪些正则表达式功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆