Python从文件中的正则表达式子字符串匹配中删除空格 [英] Python remove whitespace from regex substring match in file
问题描述
我正在读取一个文件并尝试用该匹配替换每次出现的正则表达式匹配,但删除了空格.例如,与我想要的文档中的内容正确匹配的正则表达式是 '([0-9]+\s(st|nd|rd|th))' 以便表单文档中的任何内容...
第 1、2、33、134 等将被匹配.
我想要的是简单地编写一个新文件,将原始文件中的每个出现都替换为删除的空格.
我玩过 re.findall 和 re.sub 之类的东西,但我不知道如何编写完整的文档,但只替换了没有空格的子字符串匹配项.
感谢您的帮助.
如果我理解正确,您可以使用 re.sub
来实现这一点.
不要在整个模式周围放置一个捕获组,而是在数字周围放置一个捕获组,在所选文本周围放置另一个,省略空格.
<预><代码>>>>进口重新>>>text = 'foo bar 1 st, 2 nd, 33 rd, 134 th baz quz'>>>re.sub(r'([0-9]+)\s+(st|nd|rd|th)\b', '\\1\\2', text)另一种方法是使用 lookarounds.
<预><代码>>>>re.sub(r'(?<=[0-9])\s+(?=(?:st|nd|rd|th)\b)', '', text)输出
foo bar 1st, 2nd, 33rd, 134th baz quz
I am reading in a file and trying to replace every occurrence of a regex match with that match but with the white space stripped. For example, the regex which matches correctly on what I want in my document is '([0-9]+\s(st|nd|rd|th))' so that anything inside of the document of the form...
1 st, 2 nd, 33 rd, 134 th etc. will be matched.
What I want is to simply write a new file with each of those occurrences in the original file replaced with the white space removed.
I have played with a few things like re.findall and re.sub but I cant quite figure out how to write the full document but with just the substring matches replaced without white space.
Thanks for the help.
If I understand correctly, you could use re.sub
to achieve this.
Instead of placing a capturing group around your entire pattern, place one around the numbers and another around the selected text, omitting whitespace.
>>> import re
>>> text = 'foo bar 1 st, 2 nd, 33 rd, 134 th baz quz'
>>> re.sub(r'([0-9]+)\s+(st|nd|rd|th)\b', '\\1\\2', text)
Another way would be to use lookarounds.
>>> re.sub(r'(?<=[0-9])\s+(?=(?:st|nd|rd|th)\b)', '', text)
Output
foo bar 1st, 2nd, 33rd, 134th baz quz
这篇关于Python从文件中的正则表达式子字符串匹配中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!