XQuery 正则表达式可以匹配空字符吗? [英] Can XQuery regex match a null character?

查看:32
本文介绍了XQuery 正则表达式可以匹配空字符吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从字符串中删除所有 NULL 字符.我知道正确的正则表达式匹配应该是 \x00 并且我已经尝试了以下 XQuery:

I'd like to remove all NULL characters from the string. I know that the right regex match should be \x00 and I've tried the following XQuery:

replace($message, '\x00', '')

它导致错误:

exerr:ERROR 从 XPath2 到 Java 正则表达式语法的转换失败:正则表达式 \x00 中的字符 1 处出错:转义序列无效

是否有针对此问题的快速解决方案或变通方法?我使用 eXist-db 2.2.

Is there any quick solution or workaround for this issue? I use eXist-db 2.2.

推荐答案

简短版本:您不能,至少不能在 XQuery 和 XML 规范的范围内.可能有一种我不知道的 eXist-DB 专有方法(类似于从 XQuery 原生接口 Java 正则表达式函数,其中 似乎可以在 eXist DB 中使用),但我不会认为这是快速解决方案或变通方法".

The short version: you can't, at least not within the boundaries of the XQuery and XML specifications. There may be an eXist-DB-proprietary method I am not aware of (something like nativly interfacing the Java regular expression functions from XQuery, which seems to be possible in eXist DB), but I would not consider this a "quick solution or workaround".

浏览XPath 和 XQuery 函数和运算符3.0 规范还包含 XQuery 3.0 正则表达式的定义,没有指定的方法来通过字符的 unicode 代码点转义字符.\x00 语法特定于某些正则表达式实现.regular-expressions.info 验证了这个假设:

Looking through the XPath and XQuery Functions and Operators 3.0 specification which also contains the definition of regular expressions for XQuery 3.0, there is no specified way of escaping characters by their unicode code point. The \x00 syntax is specific to some regular expression implementations. regular-expressions.info verifies this assumption:

XML 正则表达式没有像 \xFF\uFFFF 这样的标记来匹配特定的(不可打印的)字符.您必须将它们作为文字字符添加到正则表达式中.如果您使用纯文本编辑器将正则表达式输入到 XML 文件中,则可以使用  XML 语法.否则,您需要粘贴字符映射中的字符.

XML regular expressions don't have any tokens like \xFF or \uFFFF to match particular (non-printable) characters. You have to add them as literal characters to your regex. If you are entering the regex into an XML file using a plain text editor, then you can use the  XML syntax. Otherwise, you'll need to paste in the characters from a character map.

考虑到这一点,可能有两种选择:

Considering this, there might be two options:

  1. 使用 XML 实体表示空字节.这也是不可能的,因为 XML 规范不允许 定义在可扩展标记语言 (XML) 1.0(第五版):

  1. Using XML entities to denote the null byte. This is also not possible, as the XML specification does not allow control characters by definition in Extensible Markup Language (XML) 1.0 (Fifth Edition):

CharRef    ::=      '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'

附加同一规范中允许使用的字符限制:

Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

XML 1.1 将此定义扩展到控制字符 -- 包含所有字符但是空字节:

XML 1.1 extends this definition to control characters -- containing all of them but the null byte:

Char       ::=      [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

最后,考虑允许的字符,XQuery 依赖于相同的规范:

Char       ::=      [http://www.w3.org/TR/REC-xml#NT-Char]

  • 直接在 XQuery 文档中包含空字节.除了实践中的问题(在文件中包含空字节通常会导致各种意外问题),上述定义的字符限制同样适用(格式良好的 XML 文档只能包含以上定义的字符数):

  • Directly including the null byte in the XQuery document. Apart from issues in practice (including null bytes in files will often result in unexpected issues of various kinds), the same limitations to characters as defined above apply (well-formed XML documents must only consist of characters as defined above):

    document       ::=      ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) 
    

    为什么 XML 1.0 中的控制"字符是非法的?

    这篇关于XQuery 正则表达式可以匹配空字符吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆