如何将此Perl正则表达式转换为Java? [英] How do I translate this Perl regular expression into Java?

查看:170
本文介绍了如何将此Perl正则表达式转换为Java?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你如何将这个Perl正则表达式翻译成Java?

How would you translate this Perl regex into Java?

/pattern/i

编译时,它与PattErn不匹配,它失败

While compiles, it does not match "PattErn" for me, it fails

Pattern p = Pattern.compile("/pattern/i");
Matcher m = p.matcher("PattErn");

System.out.println(m.matches()); // prints "false"


推荐答案


你如何将这个Perl正则表达式翻译成Java?

How would you translate this Perl regex into Java?

/pattern/i


你不能。

有一个很多原因。以下是一些:

There are a lot of reasons for this. Here are a few:


  • Java不像Perl那样支持表达式的正则表达式语言。它缺少字形支持(如 \ X)和完整的属性支持(如 \p {Sentence_Break = SContinue} ),缺少Unicode命名字符,没有(?| ... | ... |)分支重置操作符,没有命名捕获组或逻辑 \ x {...} 在Java 7之前转义,没有递归正则表达式等等。我可以写一本关于Java缺少的书:获取过去常常回到非常原语并且使用正则表达式引擎与您习惯的相比很难。

  • Java doesn't support as expressive a regex language as Perl does. It lacks grapheme support (like \X) and full property support (like \p{Sentence_Break=SContinue}), is missing Unicode named characters, doesn't have a (?|...|...|) branch reset operator, doesn’t have named capture groups or a logical \x{...} escape before Java 7, has no recursive regexes, etc etc etc. I could write a book on what Java is missing here: Get used to going back to a very primitive and awkward to use regex engine compared with what you’re used to.

另一个甚至更糟糕的问题是因为你看起来像 faux amis ,如 \w \ b \s ,甚至 \p {alpha} \p {lower} ,与Perl相比,Java中的行为有所不同;在某些情况下,Java版本完全无法使用和错误。这是因为Perl遵循 UTS#18 ,但在Java 7之前,Java没有。您必须从Java 7添加 UNICODE_CHARACTER_CLASSES 标志,以使这些标志不再被破坏。如果你不能使用Java 7,现在就放弃,因为Java在Java 7之前有许多其他的Unicode错误,并且不值得处理它们的痛苦。

Another even worse problem is because you have lookalike faux amis like \w and and \b and \s, and even \p{alpha} and \p{lower}, which behave differently in Java compared with Perl; in some cases the Java versions are completely unusable and buggy. That’s because Perl follows UTS#18 but before Java 7, Java did not. You must add the UNICODE_CHARACTER_CLASSES flag from Java 7 to get these to stop being broken. If you can’t use Java 7, give up now, because Java had many many many other Unicode bugs before Java 7 and it just isn’t worth the pain of dealing with them.

Java通过 ^ $ 来处理换行符。,但Perl希望Unicode换行符 \ R 。您应该查看 UNIX_LINES 以了解其中发生的情况。

Java handles linebreaks via ^ and $ and ., but Perl expects Unicode linebreaks to be \R. You should look at UNIX_LINES to understand what is going on there.

默认情况下,Java不会应用任何Unicode案例折叠。确保将 UNICODE_CASE 标志添加到编译中。否则你就不会得到各种希腊sigma彼此匹配的东西。

Java does not by default apply any Unicode casefolding whatsoever. Make sure to add the UNICODE_CASE flag to your compilation. Otherwise you won’t get things like the various Greek sigmas all matching one another.

最后,它是不同的,因为充其量 Java只做简单的casefolding,而Perl总是做完整的casefolding。这意味着你不会得到 \ xDF 来匹配Java中的SS不区分情况以及类似的相关问题。

Finally, it is different because at best Java only does simple casefolding, while Perl always does full casefolding. That means that you won’t get \xDF to match "SS" case insensitively in Java, and similar related issues.

总之,你能得到的最接近的是用标志编译

In summary, the closest you can get is to compile with the flags

 CASE_INSENSITIVE | UNICODE_CASE | UNICODE_CHARACTER_CLASSES

相当于嵌入式(?iuU)

请记住,Java中的匹配并不意味着匹配,反之亦然。

And remember that match in Java doesn’t mean match, perversely enough.

这是故事的其余部分......

And here’s the rest of the story...


编译时,它与PattErn不匹配,它失败

While compiles, it does not match "PattErn" for me, it fails

   Pattern p = Pattern.compile("/pattern/i");
   Matcher m = p.matcher("PattErn");
   System.out.println(m.matches()); // prints "false"


你不应该有斜线模式。

你能做的最好的就是翻译

The best you can do is to translate

$line = "I have your PaTTerN right here";
if ($line =~ /pattern/i) {
    print "matched.\n";
}

这种方式

import java.util.regex.*;

String line     = "I have your PaTTerN right here";
String pattern  = "pattern";      
Pattern regcomp = Pattern.compile(pattern, CASE_INSENSITIVE
                                        | UNICODE_CASE
                // comment next line out for legacy Java \b\w\s breakage 
                                        | UNICODE_CHARACTER_CLASSES  
                                );    
Matcher regexec = regcomp.matcher(line);    
if (regexec.find()) {
    System.out.println("matched");
} 

在那里,看看有多容易吗? :)

There, see how much easier that isn’t? :)

Java失去了另一件事,因为Java实际上并不知道来自其头部漏洞的双向链表的正则表达式是编译 - 模式的时间编译。我,我总是发现编译时间是编译的最佳时间,但尝试告诉Java。 Java使得很难实现非常简单的程序健全度量,这是您在每个程序中始终需要做的事情。这个设计缺陷是屁股中的皇家痛苦,因为你的程序中途你会在编译期间编译其余程序时应该捕获的东西例外。就像coitus interruptus一样令人生气,因为你在完成业务的过程中一路走来,BANG一切都毁了。

Another thing you lose with Java, because Java doesn’t actually know a regex from doubly linked list from a hole in its head, is compile-time compilation of patterns. Me, I’ve always found compile time the best time for compilation, but try telling Java that. Java makes it really tough to realize that very simple program-sanity measure, something you really need to do in every program all the time. This design flaw is a royal pain in the butt, beecause halfway through your program you take an exception for something that should have been caught during compile time when the rest of your program was being compiled. Just about as exasperating as coitus interruptus, because you were well on your way to getting your business done and BANG everything is ruined.

我没有实现解决方案在我上面的代码中烦恼,但你可以用一些静态初始化来伪造它。

I didn’t implement the solution to that vexing annoyance in my code above, but you can fake it with some static initialization.

这篇关于如何将此Perl正则表达式转换为Java?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆