用于行尾的 Python 正则表达式 [英] Python Regex for End of Line
问题描述
我正在尝试编写一个在点前后添加一个空格的正则表达式.但是,如果点后有空格或行尾,我只想要这个.
I am trying to write a regex which adds a space before and after a dot. However I only want this if there is a space or end of line after the dot.
但是,对于行尾情况,我无法这样做.
However I am unable to do so for end of line cases.
Eg.
I want a hotel. >> I want a hotel .
my email is zob@gmail.com >> my email is zob@gmail.com
I have to play. bye! >> I have to play . bye!
以下是我的代码:
# If "Dot and space" after word or number put space before and after
utterance = re.sub(r'(?<=[a-z0-9])[.][ $]',' . ',utterance)
如何更正我的正则表达式以确保上面的第一个示例也有效,我尝试将 $ 符号放在方括号中,但不起作用.
How do I correct my regex to make sure my 1st example above also works, I tried putting a $ sign in square bracket but that doesn't work.
推荐答案
主要问题是字符类中的 $
表示文字 $
符号,你只需要分组结构在这里.
The main issue is that $
inside a character class denotes a literal $
symbol, you just need a grouping construct here.
我建议使用以下代码:
import re
regex = r"([^\W_])\.(?:\s+|$)"
ss = ["I want a hotel.","my email is zob@gmail.com", "I have to play. bye!"]
for s in ss:
result = re.sub(regex, r"\1 . ", s).rstrip()
print(result)
查看 Python 演示
详情:
([^\W_])
- 第 1 组匹配任何字母或数字\.
- 一个文字点(?:\s+|$)
- 匹配 1+ 个空格或字符串结尾锚点的分组(这里,$
匹配字符串结尾.)
([^\W_])
- Group 1 matching any letter or digit\.
- a literal dot(?:\s+|$)
- a grouping matching either 1+ whitespaces or end of string anchor (here,$
matches the end of string.)
rstrip
将删除替换时添加的尾随空格.
The rstrip
will remove the trailing space added during replacement.
如果您使用的是 Python 3,默认情况下 [^\W_]
将匹配所有 Unicode 字母和数字.在 Python 2 中,re.U
标志将启用此行为.
If you are using Python 3, the [^\W_]
will match all Unicode letters and digits by default. In Python 2, re.U
flag will enable this behavior.
请注意,最后一个 (?:\s+|$)
中的 \s+
会将多个空格缩小"为 1 个空格.
Note that \s+
in the last (?:\s+|$)
will "shrink" multiple whitespaces into 1 space.
这篇关于用于行尾的 Python 正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!