\ s实际上并未捕获所有空格字符 [英] \s doesn't actually capture all whitespace characters
问题描述
在我的Java 8应用程序中,我正在扫描传入的文本中的空格.但是,我的正则表达式中的\s
不能捕获所有空格.我发现到目前为止,在我的测试中尚未捕获的一个空格是非-破坏空间(Unicode 00A0).这是我正则表达式遇到的问题:
In my Java 8 app, I am scanning for whitespaces in text passed in. But \s
in my Regular Expression doesn't capture all whitespaces. The one whitespace that I've found that it doesn't capture so far in my testing is Non-breaking Space (Unicode 00A0). This was my regular expression that was running into that issue:
Pattern p = Pattern.compile("\\s");
为解决此问题,我在我的正则表达式中添加了\h
:
To solve this, I added \h
to my Regular Expression:
Pattern p = Pattern.compile("[\\s\\h]");
现在,我是否还需要注意其他空格?\s\h
不会捕获空格?
Now, are there any other whitespaces that I need to be aware of that wont be captured by \s\h
?
推荐答案
According to the Pattern class documentation the characters that match \s
are \t\n\x0B\f\r
.
但是,Unicode确实支持更多空格字符.示例包括:
However, Unicode indeed supports a whole lot more space characters. Examples include:
-
\u2002
:在太空中 -
\u2003
:Em空间 -
\u2003
:薄空间 -
\u202F
:狭窄的不间断空间
\u2002
: En space\u2003
: Em space\u2003
: Thin space\u202F
: Narrow no-break space
这篇关于\ s实际上并未捕获所有空格字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!