使用正则表达式删除版本号 [英] Removing Version Numbers with Regular Expression

查看:58
本文介绍了使用正则表达式删除版本号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想替换字符串中的版本号,例如,

I want to replace the version number in a string, e.g.,

Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0
Windows UPnP Browser 0.1.01
CamStudio
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0

我希望它成为

Microsoft Visual C++ 2008 Redistributable - x86 
Microsoft Visual C++ 2008 Redistributable - x86 
Microsoft_VC80_DebugCRT_x86_x64 
Microsoft_VC80_DebugCRT_x86 
Windows UPnP Browser
CamStudioe 
Microsoft Visual C++ 2008 Redistributable - x86 
Microsoft Visual C++ 2008 Redistributable - x86 
Microsoft_VC80_DebugCRT_x86_x64 
Microsoft_VC80_DebugCRT_x86 

以下是我的代码

<代码> S = MicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoft_VC80_DebugCRT_x86_x64 \ nMicrosoft_VC80_DebugCRT_x86 \ nWindowsUPnPBrowser \ nCamStudioe \ nMicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoftVisualC ++ 2008Redistributable 86 \ nMicrosoft_VC80_DebugCRT_x86_x64 \ nMicrosoft_VC80_DebugCRT_x86"s1=s.replace('\r','').split('\n')s2=[]对于 s1 中的 s:m = re.search('(?<=([ ]+[\.\d]*)*$)', s)s2.append(m.group(0))打印(s2)

我明白了

error: look-behind requires fixed-width pattern

有没有更好的方法来完成这项任务?

Is there a better way to achieve this task?

推荐答案

诀窍在于,您可以在匹配组中包含未在模式中返回的内容(即,它们将成为 group(0) 的一部分,但不是任何其他组).这是我的工作:

The trick is that you can have things in the pattern that aren't returned in a match group (i.e., they will be part of group(0), but not any other group). Here is what I worked out:

# put the lines to clean in a string
s='''Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0
Windows UPnP Browser 0.1.01
CamStudio
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0'''

# use findall to return the parts we want
print(re.findall(r'(.+?)(?: (?:[\d\.]+))*(?:\n|\Z)', s))

正则表达式的解释:(.+?) 是对一堆字符的非贪婪捕获.
(?: [\d\.]+)* 是一个非捕获组,重复零次或多次,以空格开头,只有数字或."跟随(在每次重复中).
(?:\n|\Z) 匹配换行符或字符串的结尾.如果您的字符串可能有回车符,您可以使用 \r?(?:\n|\Z) 代替.

Explanation of the regex: (.+?) is a non-greedy capture of a bunch of characters.
(?: [\d\.]+)* is a non-capturing group, repeated zero or more times, that starts with a space and has only digits or '.' following (in each repeat).
(?:\n|\Z) matches a newline or the end of the string. If your string might have carriage returns, you could use \r?(?:\n|\Z) instead.

对于只有一个捕获组的正则表达式,re.findall 返回字符串中每个匹配项的 group(1),这正是您想要的.正则表达式的其他部分必须匹配,但由于没有被捕获,因此不会返回.

For a regex that has only one capturing group, re.findall returns group(1) of each match in the string, which is exactly what you want. The other parts of the regex must be matched, but since they are not captured, they will not be returned.

这篇关于使用正则表达式删除版本号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆