如何在Groovy的/ pattern /语法中转义Unicode转义 [英] How to escape Unicode escapes in Groovy's /pattern/ syntax

查看:1692
本文介绍了如何在Groovy的/ pattern /语法中转义Unicode转义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的Groovy命令说明了我的问题。



首先,这可以工作( \\\a 是'),如在lotrepls.appspot.com上看到的那样) a')。

 >>>打印a.matches(/ \\\a /)

true

现在让我们说,我们要使用Unicode转义符 \\\ 来匹配 \\\
。以下使用pattern作为字符串,其行为如预期:

 >>>打印\\\
.matches(\\\ );

解释器异常:com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException:启动失败,
Script1.groovy:1:期望任何但''\ n'';无论如何都得到了它
@ line 1,21列。1错误

这是因为在至少Java是Unicode转义的早期处理( JLS 3.3 ),因此:

pre $ print\\\
.matches(\\\ )

确实如下:

  print\\\
.matches(

解决方法是转义Unicode转义,并让正则表达式引擎处理它,如下所示:

 >>> ;打印\ n.matches(\\\\
)

true

下面是问题部分:我们如何才能使用Groovy / pattern / 语法而不是使用字符串?



以下是一些失败的尝试:

 >>>打印\\\
.matches(/ \\\ /)

解释器异常:com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException:启动失败,
Script1.groovy:1:期待EOF,找到'('@ line 1,column 19.
1 error

>>> print\\ \\ n.matches(/ \\\\ /)

false

>>>打印\\\\000A.matches(/ \\ /);

true


解决方案
<$>

〜[\\\-\\\\\\ \\\ \\\-\\\\\\- \\\Ÿ]

根据我所看到的文档,双斜杠不应该用斜杠字符串,所以我不知道为什么编译器不满意它们。


The following Groovy commands illustrate my problem.

First of all, this works (as seen on lotrepls.appspot.com) as expected (note that \u0061 is 'a').

>>> print "a".matches(/\u0061/)

true

Now let's say that we want to match \n, using the Unicode escape \u000A. The following, using "pattern" as a string, behaves as expected:

>>> print "\n".matches("\u000A");

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting anything but ''\n''; got it anyway
@ line 1, column 21. 1 error

This is expected because in Java at least, Unicode escapes are processed early (JLS 3.3), so:

print "\n".matches("\u000A")

really is the same as:

print "\n".matches("
")

The fix is to escape the Unicode escape, and let the regex engine process it, as follows:

>>> print "\n".matches("\\u000A")

true

Now here's the question part: how can we get this to work with the Groovy /pattern/ syntax instead of using string literal?

Here are some failed attempts:

>>> print "\n".matches(/\u000A/)

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting EOF, found '(' @ line 1, column 19.
1 error

>>> print "\n".matches(/\\u000A/)

false

>>> print "\\u000A".matches(/\\u000A/);

true

解决方案

~"[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F-\u009F]"

Appears to be working as it should. According to the docs I've seen, the double backslashes shouldn't be required with a slashy string, so I don't know why the compiler's not happy with them.

这篇关于如何在Groovy的/ pattern /语法中转义Unicode转义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆