正则表达式匹配水平空白 [英] Regex to Match Horizontal White Spaces

查看:46
本文介绍了正则表达式匹配水平空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 Python2 中使用正则表达式来匹配水平空白而不是换行符.

\s 匹配所有空格,包括换行符.

<预><代码>>>>re.sub(r"\s", "", "line 1.\nline 2\n")'line1.line2'

\h 根本不起作用.

<预><代码>>>>re.sub(r"\h", "", "line 1.\nline 2\n")'第 1 行.\n第 2 行\n'

[\t ] 有效,但我不确定是否遗漏了其他可能的空白字符,尤其是在 Unicode 中.如\u00A0(非中断空格)或\u200A(头发空格).以下链接中有更多空白字符.https://www.cs.tut.fi/~jkorpela/chars/spaces.html

<预><代码>>>>re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)u'line1.\nline2\n\xa0\u200a\n'

您有什么建议吗?

解决方案

我最终使用了 [^\S\n] 而不是指定所有 Unicode 空格.

<预><代码>>>>re.sub(r"[^\S\n]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)u'line1.\nline2\n\n'>>>re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)u'line1.\nline2\n\xa0\u200a\n'

它按预期工作.

I need a regex in Python2 to match only horizontal white spaces not newlines.

\s matches all whitespaces including newlines.

>>> re.sub(r"\s", "", "line 1.\nline 2\n")
'line1.line2'

\h does not work at all.

>>> re.sub(r"\h", "", "line 1.\nline 2\n")
'line 1.\nline 2\n'

[\t ] works but I am not sure if I am missing other possible white space characters especially in Unicode. Such as \u00A0 (non breaking space) or \u200A (hair space). There are much more white space characters at the following link. https://www.cs.tut.fi/~jkorpela/chars/spaces.html

>>> re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\xa0\u200a\n'

Do you have any suggestions?

解决方案

I ended up using [^\S\n] instead of specifying all Unicode white spaces.

>>> re.sub(r"[^\S\n]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\n'

>>> re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\xa0\u200a\n'

It works as expected.

这篇关于正则表达式匹配水平空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆