如何使用正则表达式进行多次替换? [英] How can I do multiple substitutions using regex?

查看:117
本文介绍了如何使用正则表达式进行多次替换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用下面的代码创建一个新文件,使用正则表达式将 a 替换为 aa.

I can use this code below to create a new file with the substitution of a with aa using regular expressions.

import re

with open("notes.txt") as text:
    new_text = re.sub("a", "aa", text.read())
    with open("notes2.txt", "w") as result:
        result.write(new_text)

我想知道我是否必须多次使用这一行,new_text = re.sub("a", "aa", text.read()),但将字符串替换为其他字符串为了更改文本中的多个字母,我想更改哪些字母?

I was wondering do I have to use this line, new_text = re.sub("a", "aa", text.read()), multiple times but substitute the string for others letters that I want to change in order to change more than one letter in my text?

也就是说,a-->aa,b--> bbc--> cc.

That is, so a-->aa,b--> bb and c--> cc.

所以我必须为我想要更改的所有字母写那行,或者有更简单的方法.也许是为了创建一个翻译词典".我应该将这些字母放入数组中吗?如果我这样做,我不知道如何打电话给他们.

So I have to write that line for all the letters I want to change or is there an easier way. Perhaps to create a "dictionary" of translations. Should I put those letters into an array? I'm not sure how to call on them if I do.

推荐答案

@nhahtdh 提出的答案是有效的,但我认为与规范示例相比,pythonic 更少,后者使用的代码比他的正则表达式操作更不透明,并利用了python 内置的数据结构和匿名函数特性.

The answer proposed by @nhahtdh is valid, but I would argue less pythonic than the canonical example, which uses code less opaque than his regex manipulations and takes advantage of python's built-in data structures and anonymous function feature.

翻译词典在这种情况下是有意义的.事实上,Python Cookbook 就是这样做的,如本例所示(复制自 ActiveState http://code.activestate.com/recipes/81330-single-pass-multiple-replace/ )

A dictionary of translations makes sense in this context. In fact, that's how the Python Cookbook does it, as shown in this example (copied from ActiveState http://code.activestate.com/recipes/81330-single-pass-multiple-replace/ )

import re 

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

if __name__ == "__main__": 

  text = "Larry Wall is the creator of Perl"

  dict = {
    "Larry Wall" : "Guido van Rossum",
    "creator" : "Benevolent Dictator for Life",
    "Perl" : "Python",
  } 

  print multiple_replace(dict, text)

因此,在您的情况下,您可以创建一个 dict trans = {"a": "aa", "b": "bb"} 然后将其传递给 multiple_replace 以及您要翻译的文本.基本上,该函数所做的就是创建一个巨大的正则表达式,其中包含要翻译的所有正则表达式,然后当找到一个时,将 lambda 函数传递给 regex.sub 以执行翻译字典查找.

So in your case, you could make a dict trans = {"a": "aa", "b": "bb"} and then pass it into multiple_replace along with the text you want translated. Basically all that function is doing is creating one huge regex containing all of your regexes to translate, then when one is found, passing a lambda function to regex.sub to perform the translation dictionary lookup.

您可以在读取文件时使用此功能,例如:

You could use this function while reading from your file, for example:

with open("notes.txt") as text:
    new_text = multiple_replace(replacements, text.read())
with open("notes2.txt", "w") as result:
    result.write(new_text)

我实际上在生产中使用了这种确切的方法,在这种情况下,我需要将一年中的几个月从捷克语翻译成英语以执行网络抓取任务.

I've actually used this exact method in production, in a case where I needed to translate the months of the year from Czech into English for a web scraping task.

正如@nhahtdh 指出的那样,这种方法的一个缺点是它不是无前缀的:作为其他字典键前缀的字典键会导致方法中断.

As @nhahtdh pointed out, one downside to this approach is that it is not prefix-free: dictionary keys that are prefixes of other dictionary keys will cause the method to break.

这篇关于如何使用正则表达式进行多次替换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆