正则表达式:获取对数字的反向引用,添加到它 [英] regex: getting backreference to number, adding to it
问题描述
简单的正则表达式问题:
Simple regex question:
我想用 pagenumber + 一些数字(比如 10)替换字符串中的页码.我想我可以使用反向引用捕获匹配的页码,对其进行操作并将其用作 re.sub
中的替换参数.
I want to replace page numbers in a string with pagenumber + some number (say, 10). I figured I could capture a matched page number with a backreference, do an operation on it and use it as the replacement argument in re.sub
.
这有效(只是传递值):
This works (just passing the value):
def add_pages(x):
return x
re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE)
屈服,当然,'这里是第 11 页,这里是第 78 页\n然后是第 65 页'
现在,如果我更改 add_pages 函数以修改传递的反向引用,则会出现错误.
Now, if I change the add_pages function to modify the passed backreference, I get an error.
def add_pages(x):
return int(x)+10
re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE)
ValueError: invalid literal for int() with base 10: '\\1'
,因为传递给 add_pages 函数的似乎是文字反向引用,而不是它引用的内容.
, as what is passed to the add_pages function seems to be the literal backreference, not what it references.
如果没有将所有匹配的数字提取到列表中,然后处理并添加回来,我该怎么做?
Absent extracting all matched numbers to a list and then processing and adding back, how would I do this?
推荐答案
实际问题是,你应该将一个函数传递给 re.sub
的第二个参数,而不是调用一个函数并传递返回值.
The actual problem is, you are supposed to pass a function to the second parameter of re.sub
, instead you are calling a function and passing the return value.
只要找到匹配项,就会查看第二个参数.如果是字符串,则将其用作替换,如果是函数,则将使用 匹配对象. 在你的情况下,add_pages(r"\1")
,只是返回 r"\1"
本身.所以,re.sub
翻译成这个
Whenever a match is found, the second parameter will be looked at. If it is a string, then it will be used as the replacement, if it is a function, then the function will be called with the match object. In your case, add_pages(r"\1")
, is simply returning r"\1"
itself. So, the re.sub
translates to this
print re.sub("(?<=Page )(\d{2})", r"\1", ...)
因此,它实际上将原始匹配的字符串替换为相同的字符串.这就是它起作用的原因.
So, it actually replaces the original matched string with the same. That is why it works.
但是,在第二种情况下,当你这样做时
But, in the second case, when you do
add_pages(r"\1")
您正在尝试将 r"\1"
转换为整数,这是不可能的.这就是它失败的原因.
you are trying to convert r"\1"
to an integer, which is not possible. That is why it is failing.
实际的写法是,
def add_pages(matchObject):
return str(int(matchObject.group()) + 10)
print re.sub("(?<=Page )(\d{2})", add_pages, ...)
Read more about the group
function, here
这篇关于正则表达式:获取对数字的反向引用,添加到它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!