正则表达式:获取对数字的反向引用,添加到它 [英] regex: getting backreference to number, adding to it

查看:44
本文介绍了正则表达式:获取对数字的反向引用,添加到它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简单的正则表达式问题:

Simple regex question:

我想用 pagenumber + 一些数字(比如 10)替换字符串中的页码.我想我可以使用反向引用捕获匹配的页码,对其进行操作并将其用作 re.sub 中的替换参数.

I want to replace page numbers in a string with pagenumber + some number (say, 10). I figured I could capture a matched page number with a backreference, do an operation on it and use it as the replacement argument in re.sub.

这有效(只是传递值):

This works (just passing the value):

def add_pages(x):
    return x

re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE)

屈服,当然,'这里是第 11 页,这里是第 78 页\n然后是第 65 页'

现在,如果我更改 add_pages 函数以修改传递的反向引用,则会出现错误.

Now, if I change the add_pages function to modify the passed backreference, I get an error.

def add_pages(x):
        return int(x)+10


re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE)

ValueError: invalid literal for int() with base 10: '\\1'

,因为传递给 add_pages 函数的似乎是文字反向引用,而不是它引用的内容.

, as what is passed to the add_pages function seems to be the literal backreference, not what it references.

如果没有将所有匹配的数字提取到列表中,然后处理并添加回来,我该怎么做?

Absent extracting all matched numbers to a list and then processing and adding back, how would I do this?

推荐答案

实际问题是,你应该将一个函数传递给 re.sub 的第二个参数,而不是调用一个函数并传递返回值.

The actual problem is, you are supposed to pass a function to the second parameter of re.sub, instead you are calling a function and passing the return value.

只要找到匹配项,就会查看第二个参数.如果是字符串,则将其用作替换,如果是函数,则将使用 匹配对象. 在你的情况下,add_pages(r"\1"),只是返回 r"\1" 本身.所以,re.sub 翻译成这个

Whenever a match is found, the second parameter will be looked at. If it is a string, then it will be used as the replacement, if it is a function, then the function will be called with the match object. In your case, add_pages(r"\1"), is simply returning r"\1" itself. So, the re.sub translates to this

print re.sub("(?<=Page )(\d{2})", r"\1", ...)

因此,它实际上将原始匹配的字符串替换为相同的字符串.这就是它起作用的原因.

So, it actually replaces the original matched string with the same. That is why it works.

但是,在第二种情况下,当你这样做时

But, in the second case, when you do

add_pages(r"\1")

您正在尝试将 r"\1" 转换为整数,这是不可能的.这就是它失败的原因.

you are trying to convert r"\1" to an integer, which is not possible. That is why it is failing.

实际的写法是,

def add_pages(matchObject):
    return str(int(matchObject.group()) + 10)
print re.sub("(?<=Page )(\d{2})", add_pages, ...)

此处了解有关 group 函数的更多信息

Read more about the group function, here

这篇关于正则表达式:获取对数字的反向引用,添加到它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆