在Java XML解析器的Saxon 9中,无法识别正则表达式中的单词边界(\ b) [英] In Saxon 9 he Java XML parser, word boundaries (\b) in regular expressions are not recognized

查看:73
本文介绍了在Java XML解析器的Saxon 9中,无法识别正则表达式中的单词边界(\ b)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下简单的正则表达式:

I have the following simple regular expression:

\b\w+\b

Saxon报告以下错误:

Saxon reports the following error:

syntax error at char 2 in regular expression: Escape character 'b' not allowed

这是否意味着我不能在Java Saxon解析器中使用单词边界?是否有替代的免费XML Java解析器具有此功能?

Does it mean I can't use word boundaries with Java Saxon parser? Is there an alternative free XML Java parser that has this functionality?

推荐答案

在XSD和XPath中使用的正则表达式方言不能识别\ b(作为单词边界或作为退格键).我认为,排除它的原因可能是由于单词边界依赖于语言/文化而引起的焦虑,尽管这是不合逻辑的,因为方言确实支持\ w(单词字符),并且单词边界可以简单地定义为匹配\ w的字符和不匹配的字符.另外,XSD团队可能会担心零长度匹配会引起的歧义,这是臭名昭著的bug来源,并且很难精确地精确指定正则表达式的作用.

The regular expression dialect used in XSD and XPath does not recognize \b (either as a word boundary or as a backspace). I think the reason for excluding it was probably a misplaced anxiety that word boundaries are language/culture dependent, though that's illogical since the dialect does support \w (a word character), and a word boundary can be simply defined as a boundary between a character that matches \w and a character that doesn't. Alternatively the XSD team may have been worried about the ambiguities that arise with zero-length matches, which are a notorious source of bugs and make it very hard to specify rigorously exactly what regular expressions do.

所以这不是Saxon的限制,它是XPath规范中写入的限制.

So it's not a Saxon limitation, it's a limitation written into the XPath specification.

如果您不太在乎标准一致性,那么Saxon允许您输入!"."flags"参数的末尾表示您的正则表达式是Java正则表达式,而不是XPath正则表达式.

If you're not too concerned about standards conformance, Saxon allows you to put "!" at the end of the "flags" argument to indicate that your regular expression is a Java regular expression rather than an XPath regular expression.

这篇关于在Java XML解析器的Saxon 9中,无法识别正则表达式中的单词边界(\ b)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆