什么时候'。'在正则表达式中不匹配? [英] When does '.' not match in a Regex?

查看:152
本文介绍了什么时候'。'在正则表达式中不匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了以下问题(简化)。我写了以下内容

I encountered the following problem (simplified). I wrote the following

Pattern pattern = Pattern.compile("Fig.*");
String s = readMyString();
Matcher matcher = pattern.matcher(s);

在读取一个字符串时,匹配器无法匹配,即使它以Fig开头。我将问题跟踪到字符串下一部分中的流氓角色。它的代码点值为1633来自

In reading one string the matcher failed to match even though it started with "Fig". I tracked the problem down to a rogue character in the next part of the string. It had codePoint value 1633 from

(int) charAt(i)

但与正则表达式不匹配。我认为这是由于输入过程中某处的非UTF-8编码。

but did not match the regex. I think it is due to a non-UTF-8 encoding somewhere in the input process.

Javadocs 说:


预定义的字符类
。任何字符(可能与行终止符匹配也可能不匹配)

Predefined character classes . Any character (may or may not match line terminators)

据推测,这不是严格意义上的字符,而是仍然是String的一部分。如何检测此问题?

Presumably this is not a character in the strict sense of the word, but is is still part of the String. How do I detect this problem?

更新:这是由于(char)10不容易发现。我上面的诊断是错误的,下面的所有答案都与问题相关并且很有用。

UPDATE: It was due to a (char)10 which was not easy to spot. My diagnosis above is wrong and all answers below are relevant to the question as asked and are useful.

推荐答案

Java正则表达式中的字符匹配行终止符之外的任何字符,除非您使用标志 Pattern.DOTALL 在编译你的模式时。

The . character in a Java regex matches any character except line terminators, unless you use the flag Pattern.DOTALL when compiling your pattern.

为此,你可以使用这样的模式:

To do so, you would use a Pattern like this:

Pattern p = Pattern.compile("somepattern", Pattern.DOTALL);

这篇关于什么时候'。'在正则表达式中不匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆