Java正则表达式匹配_all_空白字符 [英] Java regular expression to match _all_ whitespace characters
问题描述
我在Java中寻找一个匹配String中所有空白字符的正则表达式。 \只匹配一些,它与& nbsp;
和类似的非ascii空格不匹配。我正在寻找一个正则表达式,它匹配Java String中可能出现的所有(常见)空格字符。
I'm looking for a regular expression in Java which matches all whitespace characters in a String. "\s" matches only some, it does not match
and similar non-ascii whitespaces. I'm looking for a regular expression which matches all (common) white-space characters which can occur in a Java String.
澄清:我不是指字符串序列 & nbsp;
我的意思是sincle unicode字符U + 00A0通常用& nbsp;
表示,例如在HTML中,以及具有类似的空白区域的所有其他unicode字符,例如, NARROW NO-BREAK SPACE(U + 202F),以Unicode 3.2及以上编码为U + 2060的Word连接器,ZERO WIDTH NO-BREAK SPACE(U + FEFF)以及任何其他可以归为白色的角色 - 空格。
To clarify: I do not mean the string sequence "
" I mean the sincle unicode character U+00A0 that is often represented by "
", e.g. in HTML, and all other unicode characters with a similar white-space meainig, e.g. "NARROW NO-BREAK SPACE" (U+202F), Word joiner encoded in Unicode 3.2 and above as U+2060, "ZERO WIDTH NO-BREAK SPACE" (U+FEFF) and any other character that can be regareded as white-space.
[答案]
对于我的pupose,即捕捉所有空格字符,unicode + traditional,以下表达式完成工作:
For my pupose, ie catching all whitespace characters, unicode + traditional, the following expression does the job:
[\p {Z} \s]
答案在下面的评论中,但由于它有点隐藏,我在这里重复一遍。
The answer is in the comments below but since it is a bit hidden I repeat it here.
推荐答案
& nbsp;
只是HTML中的空格。使用 HTML解析器提取纯文本。和 \s
应该可以正常工作。
The
is only whitespace in HTML. Use an HTML parser to extract the plain text. and \s
should work just fine.
这篇关于Java正则表达式匹配_all_空白字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!