什么时候'。'在正则表达式中不匹配? [英] When does '.' not match in a Regex?
问题描述
我遇到了以下问题(简化)。我写了以下内容
I encountered the following problem (simplified). I wrote the following
Pattern pattern = Pattern.compile("Fig.*");
String s = readMyString();
Matcher matcher = pattern.matcher(s);
在读取一个字符串时,匹配器无法匹配,即使它以Fig开头。我将问题跟踪到字符串下一部分中的流氓角色。它的代码点值为1633来自
In reading one string the matcher failed to match even though it started with "Fig". I tracked the problem down to a rogue character in the next part of the string. It had codePoint value 1633 from
(int) charAt(i)
但与正则表达式不匹配。我认为这是由于输入过程中某处的非UTF-8编码。
but did not match the regex. I think it is due to a non-UTF-8 encoding somewhere in the input process.
Javadocs 说:
预定义的字符类
。任何字符(可能与行终止符匹配也可能不匹配)
Predefined character classes . Any character (may or may not match line terminators)
据推测,这不是严格意义上的字符,而是仍然是String的一部分。如何检测此问题?
Presumably this is not a character in the strict sense of the word, but is is still part of the String. How do I detect this problem?
更新:这是由于(char)10不容易发现。我上面的诊断是错误的,下面的所有答案都与问题相关并且很有用。
UPDATE: It was due to a (char)10 which was not easy to spot. My diagnosis above is wrong and all answers below are relevant to the question as asked and are useful.
推荐答案
。
Java正则表达式中的字符匹配除行终止符之外的任何字符,除非您使用标志 Pattern.DOTALL
在编译你的模式时。
The .
character in a Java regex matches any character except line terminators, unless you use the flag Pattern.DOTALL
when compiling your pattern.
为此,你可以使用这样的模式:
To do so, you would use a Pattern like this:
Pattern p = Pattern.compile("somepattern", Pattern.DOTALL);
这篇关于什么时候'。'在正则表达式中不匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!