Python Regex子-使用Match作为替换中的Dict键 [英] Python Regex Sub - Use Match as Dict Key in Substitution

查看:107
本文介绍了Python Regex子-使用Match作为替换中的Dict键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将程序从Perl转换为Python(3.3).我对Python相当陌生.在Perl中,我可以巧妙地进行正则表达式替换,例如:

$string =~ s/<(\w+)>/$params->{$1}/g;

这将搜索$string,对于<>中包含的每组单词字符,都会使用正则表达式匹配项作为哈希键来替换$params hashref.

简洁地复制此行为的最佳(Pythonic)方法是什么?我已经按照以下思路提出了一些建议:

string = re.sub(r'<(\w+)>', (what here?), string)

如果我可以传递一个将正则表达式匹配映射到字典的函数,那可能会很好.有可能吗?

感谢您的帮助.

解决方案

您可以将可调用对象传递给re.sub,以告诉它如何处理匹配对象.

s = re.sub(r'<(\w+)>', lambda m: replacement_dict.get(m.group()), s)

使用dict.get可以让您在替换字词(即

)中没有该字词时提供后备广告"

lambda m: replacement_dict.get(m.group(), m.group()) 
# fallback to just leaving the word there if we don't have a replacement

我会注意到,当使用re.sub(和家族,即re.split)时,当指定所需的替代内容 时,使用环顾四周的表达式通常会更干净一些,这样比赛周围的东西不会消失.所以在这种情况下,我会像这样写你的正则表达式

r'(?<=<)(\w+)(?=>)'

否则,您必须在lambda中的括号中进行一些拼接/切入.为了弄清楚我在说什么,举一个例子:

s = "<sometag>this is stuff<othertag>this is other stuff<closetag>"

d = {'othertag': 'blah'}

#this doesn't work because `group` returns the whole match, including non-groups
re.sub(r'<(\w+)>', lambda m: d.get(m.group(), m.group()), s)
Out[23]: '<sometag>this is stuff<othertag>this is other stuff<closetag>'

#this output isn't exactly ideal...
re.sub(r'<(\w+)>', lambda m: d.get(m.group(1), m.group(1)), s)
Out[24]: 'sometagthis is stuffblahthis is other stuffclosetag'

#this works, but is ugly and hard to maintain
re.sub(r'<(\w+)>', lambda m: '<{}>'.format(d.get(m.group(1), m.group(1))), s)
Out[26]: '<sometag>this is stuff<blah>this is other stuff<closetag>'

#lookbehind/lookahead makes this nicer.
re.sub(r'(?<=<)(\w+)(?=>)', lambda m: d.get(m.group(), m.group()), s)
Out[27]: '<sometag>this is stuff<blah>this is other stuff<closetag>'

I'm translating a program from Perl to Python (3.3). I'm fairly new with Python. In Perl, I can do crafty regex substitutions, such as:

$string =~ s/<(\w+)>/$params->{$1}/g;

This will search through $string, and for each group of word characters enclosed in <>, a substitution from the $params hashref will occur, using the regex match as the hash key.

What is the best (Pythonic) way to concisely replicate this behavior? I've come up with something along these lines:

string = re.sub(r'<(\w+)>', (what here?), string)

It might be nice if I could pass a function that maps regex matches to a dict. Is that possible?

Thanks for the help.

解决方案

You can pass a callable to re.sub to tell it what to do with the match object.

s = re.sub(r'<(\w+)>', lambda m: replacement_dict.get(m.group()), s)

use of dict.get allows you to provide a "fallback" if said word isn't in the replacement dict, i.e.

lambda m: replacement_dict.get(m.group(), m.group()) 
# fallback to just leaving the word there if we don't have a replacement

I'll note that when using re.sub (and family, ie re.split), when specifying stuff that exists around your wanted substitution, it's often cleaner to use lookaround expressions so that the stuff around your match doesn't get subbed out. So in this case I'd write your regex like

r'(?<=<)(\w+)(?=>)'

Otherwise you have to do some splicing out/back in of the brackets in your lambda. To be clear what I'm talking about, an example:

s = "<sometag>this is stuff<othertag>this is other stuff<closetag>"

d = {'othertag': 'blah'}

#this doesn't work because `group` returns the whole match, including non-groups
re.sub(r'<(\w+)>', lambda m: d.get(m.group(), m.group()), s)
Out[23]: '<sometag>this is stuff<othertag>this is other stuff<closetag>'

#this output isn't exactly ideal...
re.sub(r'<(\w+)>', lambda m: d.get(m.group(1), m.group(1)), s)
Out[24]: 'sometagthis is stuffblahthis is other stuffclosetag'

#this works, but is ugly and hard to maintain
re.sub(r'<(\w+)>', lambda m: '<{}>'.format(d.get(m.group(1), m.group(1))), s)
Out[26]: '<sometag>this is stuff<blah>this is other stuff<closetag>'

#lookbehind/lookahead makes this nicer.
re.sub(r'(?<=<)(\w+)(?=>)', lambda m: d.get(m.group(), m.group()), s)
Out[27]: '<sometag>this is stuff<blah>this is other stuff<closetag>'

这篇关于Python Regex子-使用Match作为替换中的Dict键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆