Python 正则表达式,删除除 unicode 字符串的连字符以外的所有标点符号 [英] Python regex, remove all punctuation except hyphen for unicode string
问题描述
我有这个用于从正则表达式字符串中删除所有标点符号的代码:
I have this code for removing all punctuation from a regex string:
import regex as re
re.sub(ur"\p{P}+", "", txt)
如何更改它以允许使用连字符?如果你能解释一下你是如何做到的,那就太好了.我明白了,如果我错了,请纠正我,P 在标点符号后面加上任何内容.
How would I change it to allow hyphens? If you could explain how you did it, that would be great. I understand that here, correct me if I'm wrong, P with anything after it is punctuation.
推荐答案
[^\P{P}-]+
\P
是 \p
的补充 - 不是标点符号.所以这匹配任何not(不是标点符号或破折号) - 导致除破折号之外的所有标点符号.
\P
is the complementary of \p
- not punctuation. So this matches anything that is not (not punctuation or a dash) - resulting in all punctuation except dashes.
示例:http://www.rubular.com/r/JsdNM3nFJ3
如果你想要一个简单的方式,另一种选择是 \p{P}(?<!-)
:匹配所有标点符号,然后检查它不是破折号(使用负面回顾).
工作示例:http://www.rubular.com/r/5G62iSYTdk
If you want a non-convoluted way, an alternative is \p{P}(?<!-)
: match all punctuation, and then check it wasn't a dash (using negative lookbehind).
Working example: http://www.rubular.com/r/5G62iSYTdk
这篇关于Python 正则表达式,删除除 unicode 字符串的连字符以外的所有标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!