Java正则表达式匹配_all_空白字符 [英] Java regular expression to match _all_ whitespace characters

查看:328
本文介绍了Java正则表达式匹配_all_空白字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Java中寻找一个匹配String中所有空白字符的正则表达式。 \只匹配一些,它与& nbsp; 和类似的非ascii空格不匹配。我正在寻找一个正则表达式,它匹配Java String中可能出现的所有(常见)空格字符。

I'm looking for a regular expression in Java which matches all whitespace characters in a String. "\s" matches only some, it does not match   and similar non-ascii whitespaces. I'm looking for a regular expression which matches all (common) white-space characters which can occur in a Java String.

澄清:我不是指字符串序列 & nbsp; 我的意思是sincle unicode字符U + 00A0通常用& nbsp; 表示,例如在HTML中,以及具有类似的空白区域的所有其他unicode字符,例如, NARROW NO-BREAK SPACE(U + 202F),以Unicode 3.2及以上编码为U + 2060的Word连接器,ZERO WIDTH NO-BREAK SPACE(U + FEFF)以及任何其他可以归为白色的角色 - 空格。

To clarify: I do not mean the string sequence " " I mean the sincle unicode character U+00A0 that is often represented by " ", e.g. in HTML, and all other unicode characters with a similar white-space meainig, e.g. "NARROW NO-BREAK SPACE" (U+202F), Word joiner encoded in Unicode 3.2 and above as U+2060, "ZERO WIDTH NO-BREAK SPACE" (U+FEFF) and any other character that can be regareded as white-space.

[答案]

对于我的pupose,即捕捉所有空格字符,unicode + traditional,以下表达式完成工作:

For my pupose, ie catching all whitespace characters, unicode + traditional, the following expression does the job:

[\p {Z} \s]

答案在下面的评论中,但由于它有点隐藏,我在这里重复一遍。

The answer is in the comments below but since it is a bit hidden I repeat it here.

推荐答案

& nbsp; 只是HTML中的空格。使用 HTML解析器提取纯文本。和 \s 应该可以正常工作。

The   is only whitespace in HTML. Use an HTML parser to extract the plain text. and \s should work just fine.

这篇关于Java正则表达式匹配_all_空白字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆