可以使 [a-zA-Z] Python 正则表达式模式匹配和替换非 ASCII Unicode 字符吗? [英] Can the [a-zA-Z] Python regex pattern be made to match and replace non-ASCII Unicode characters?

查看:30
本文介绍了可以使 [a-zA-Z] Python 正则表达式模式匹配和替换非 ASCII Unicode 字符吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的正则表达式中,我希望字符串中的每个字符都替换为X",但它不起作用.

在 Python 2.7 中:

<预><代码>>>>进口重新>>>re.sub(u"[a-zA-Z]","X","dfäg")'XX\xc3\xa4X'

<预><代码>>>>re.sub("[a-zA-Z]","X","dfäg",re.UNICODE)你'XX\xe4X'

在 Python 3.4 中:

<预><代码>>>>re.sub("[a-zA-Z]","X","dfäg")'XXäX'

是否有可能以某种方式配置"[a-zA-Z] 模式以匹配ä"、ü"等?如果无法做到这一点,我如何在方括号之间创建一个类似的字符范围模式,其中包括通常的完整字母表"范围内的 Unicode 字符?我的意思是,在像德语这样的语言中,'ä' 将被放置在字母表中靠近 'a' 的某个位置,因此人们会期望它包含在 'a-z' 范围内.

解决方案

您可以使用

(?![\d_])\w

带有 Unicode 修饰符.(?![\d_]) 前瞻限制了 \w 速记类,因为它不能匹配任何数字 (\d>) 或下划线.

参见正则表达式演示

Python 3 演示:

导入重新打印 (re.sub(r"(?![\d_])\w","X","dfäg"))# =>XXX

至于 Python 2:

# -*- 编码:utf-8 -*-进口重新s = "dfäg"w = re.sub(ur'(?![\d_])\w', u'X', s.decode('utf8'), 0, re.UNICODE).encode("utf8")打印(宽)

In the following regular expression, I would like each character in the string replaced with an 'X', but it isn't working.

In Python 2.7:

>>> import re
>>> re.sub(u"[a-zA-Z]","X","dfäg")
'XX\xc3\xa4X'

or

>>> re.sub("[a-zA-Z]","X","dfäg",re.UNICODE)
u'XX\xe4X'

In Python 3.4:

>>> re.sub("[a-zA-Z]","X","dfäg")
'XXäX'

Is it possible to somehow 'configure' the [a-zA-Z] pattern to match 'ä', 'ü', etc.? If this can't be done, how can I create a similar character range pattern between square brackets that would include Unicode characters in the usual 'full alphabet' range? I mean, in a language like German, for instance, 'ä' would be placed somewhere close to 'a' in the alphabet, so one would expect it to be included in the 'a-z' range.

解决方案

You may use

(?![\d_])\w

With the Unicode modifier. The (?![\d_]) look-ahead is restricting the \w shorthand class so as it could not match any digits (\d) or underscores.

See regex demo

A Python 3 demo:

import re
print (re.sub(r"(?![\d_])\w","X","dfäg"))
# => XXXX

As for Python 2:

# -*- coding: utf-8 -*-
import re
s = "dfäg"
w = re.sub(ur'(?![\d_])\w', u'X', s.decode('utf8'), 0, re.UNICODE).encode("utf8")
print(w)

这篇关于可以使 [a-zA-Z] Python 正则表达式模式匹配和替换非 ASCII Unicode 字符吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆