python 2.7+中的无效组引用 [英] Invalid group reference in python 2.7+

查看:67
本文介绍了python 2.7+中的无效组引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将我在django中创建的网页中所有WikiLink类型的字符串转换为html链接.

I am trying to convert all WikiLink type of strings in my webpage(created in django) to html links.

我正在使用以下表达式

import re
expr = r'\s+[A-Z][a-z]+[A-Z][a-z]+\s'
repl=r'<a href="/photos/\1">\1</a>'
mystr = 'this is a string to Test whether WikiLink will work ProPerly'

parser=re.compile(expr)
parser.sub(repl, mystr)

这将返回以下字符串,并用十六进制值替换该字符串.

This returns me the following string with hex value replaced for the string.

"this is a string to Test whether<a href='/mywiki/\x01>\x01</a>'will work<a href='/mywiki/\x01>\x01</a>'"

查看 python帮助以获得re.sub,我尝试将\ 1更改为\ g< 1>,但这会导致无效的组引用错误.

Looking at the python help for re.sub, I tried changing \1 to \g<1> but that results in a invalid group reference error.

请帮助我了解如何使它工作

Please help me understand how to get this working

推荐答案

此处的问题是 expr 中没有捕获的组.

The problem here is that you don't have any captured groups in the expr.

无论比赛的哪一部分要显示为 \ 1 ,都需要加括号.例如:

Whatever part of the match you want to show up as \1, you need to put in parentheses. For example:

>>> expr = r'\s+([A-Z][a-z]+[A-Z][a-z]+)\s'
>>> parser=re.compile(expr)
>>> parser.sub(repl, mystr)
'this is a string to Test whether<a href="/photos/WikiLink">WikiLink</a>will work ProPerly'

后向引用 \ 1 指的是匹配中的组1,它是与第一个带括号的子表达式匹配的部分.同样, \ 2 是组2,与第二个带括号的子表达式匹配的部分,依此类推.如果少于1个组时使用 \ 1 ,则某些正则表达式引擎将给您一个错误,而其他正则表达式引擎将使用文字'\ 1'字符,即ctrl-一种;Python完成了后者,而ctrl-A的规范表示法是'\ x01',所以这就是为什么您这么看的原因.

The backreference \1 refers to the group 1 within the match, which is the part that matched the first parenthesized subexpression. Likewise, \2 is group 2, the part that matched the second parenthesized subexpression, and so on. If you use \1 when you have fewer than 1 group, some regexp engines will give you an error, others will use a literal '\1' character, a ctrl-A; Python does the latter, and the canonical representation of ctrl-A is '\x01', so that's why you see it that way.

第0组是整个比赛.但这不是您想要的,因为您不希望空格成为替换的一部分.

Group 0 is the entire match. But that's not what you want in this case, because you don't want the spaces to be part of the substitution.

您需要 g 语法的唯一原因是当简单的后向引用不明确时.例如,如果sub是 123 \ 1456 ,则无法分辨这是否意味着 123 ,然后是第1组,然后是 456 ,或 123 ,后跟1456组,或者…

The only reason you need the g syntax is when a simple backreference is ambiguous. For example, if sub were 123\1456, there's no way to tell whether that means 123, followed by group 1, followed by 456, or 123 followed by group 1456, or…

进一步了解分组和反向引用.

这篇关于python 2.7+中的无效组引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆