使用正则表达式删除版本号 [英] Removing Version Numbers with Regular Expression
问题描述
我想替换字符串中的版本号,例如,
I want to replace the version number in a string, e.g.,
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0
Windows UPnP Browser 0.1.01
CamStudio
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0
我希望它成为
Microsoft Visual C++ 2008 Redistributable - x86
Microsoft Visual C++ 2008 Redistributable - x86
Microsoft_VC80_DebugCRT_x86_x64
Microsoft_VC80_DebugCRT_x86
Windows UPnP Browser
CamStudioe
Microsoft Visual C++ 2008 Redistributable - x86
Microsoft Visual C++ 2008 Redistributable - x86
Microsoft_VC80_DebugCRT_x86_x64
Microsoft_VC80_DebugCRT_x86
以下是我的代码
<代码> S = MicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoft_VC80_DebugCRT_x86_x64 \ nMicrosoft_VC80_DebugCRT_x86 \ nWindowsUPnPBrowser \ nCamStudioe \ nMicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoft_VC80_DebugCRT_x86_x64 \ nMicrosoft_VC80_DebugCRT_x86"s1=s.replace('\r','').split('\n')s2=[]对于 s1 中的 s:m = re.search('(?<=([ ]+[\.\d]*)*$)', s)s2.append(m.group(0))打印(s2)
我明白了
error: look-behind requires fixed-width pattern
有没有更好的方法来完成这项任务?
Is there a better way to achieve this task?
推荐答案
诀窍在于,您可以在匹配组中包含未在模式中返回的内容(即,它们将成为 group(0) 的一部分,但不是任何其他组).这是我的工作:
The trick is that you can have things in the pattern that aren't returned in a match group (i.e., they will be part of group(0), but not any other group). Here is what I worked out:
# put the lines to clean in a string
s='''Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0
Windows UPnP Browser 0.1.01
CamStudio
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0'''
# use findall to return the parts we want
print(re.findall(r'(.+?)(?: (?:[\d\.]+))*(?:\n|\Z)', s))
正则表达式的解释:(.+?)
是对一堆字符的非贪婪捕获.(?: [\d\.]+)*
是一个非捕获组,重复零次或多次,以空格开头,只有数字或."跟随(在每次重复中).(?:\n|\Z)
匹配换行符或字符串的结尾.如果您的字符串可能有回车符,您可以使用 \r?(?:\n|\Z)
代替.
Explanation of the regex: (.+?)
is a non-greedy capture of a bunch of characters.
(?: [\d\.]+)*
is a non-capturing group, repeated zero or more times, that starts with a space and has only digits or '.' following (in each repeat).
(?:\n|\Z)
matches a newline or the end of the string. If your string might have carriage returns, you could use \r?(?:\n|\Z)
instead.
对于只有一个捕获组的正则表达式,re.findall
返回字符串中每个匹配项的 group(1),这正是您想要的.正则表达式的其他部分必须匹配,但由于没有被捕获,因此不会返回.
For a regex that has only one capturing group, re.findall
returns group(1) of each match in the string, which is exactly what you want. The other parts of the regex must be matched, but since they are not captured, they will not be returned.
这篇关于使用正则表达式删除版本号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!