在Python中有效地执行多个字符串替换 [英] Efficiently carry out multiple string replacements in Python

查看:511
本文介绍了在Python中有效地执行多个字符串替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想进行多个字符串替换,最有效的方法是什么?

If I would like to carry out multiple string replacements, what is the most efficient way to carry this out?

在旅行中遇到的那种情况的示例如下:

An example of the kind of situation I have encountered in my travels is as follows:

>>> strings = ['a', 'list', 'of', 'strings']
>>> [s.replace('a', '')...replace('u', '')  for s in strings if len(s) > 2]
['a', 'lst', 'of', 'strngs']

推荐答案

您给出的特定示例(删除单个字符)非常适合字符串的translate方法,用单个字符替换单个字符也是如此.如果输入字符串是Unicode字符串,则与上述两种替换"一样,使用translate方法也可以用多个字符串替换单个字符(如果您需要处理字节字符串,则不需要这样) ,但).

The specific example you give (deleting single characters) is perfect for the translate method of strings, as is substitution of single characters with single characters. If the input string is a Unicode one, then, as well as the two above kinds of "substitution", substitution of single characters with multiple character strings is also fine with the translate method (not if you need to work on byte strings, though).

如果您需要替换多个字符的子字符串,那么我也建议使用正则表达式-尽管不是@gnibbler的答案所推荐的方式;相反,我将从r'onestring|another|yetanother|orthis'构建正则表达式(加入要用竖线替换的子字符串-当然,如果它们包含特殊字符,请确保也将re.escape换成子字符串),并编写一个基于替换函数的简单根据命令.

If you need to replace substrings of multiple characters, then I would also recommend using a regular expression -- though not in the way @gnibbler's answer recommends; rather, I'd build the regex from r'onestring|another|yetanother|orthis' (join the substrings you want to replace with vertical bars -- be sure to also re.escape them if they contain special characters, of course) and write a simple substituting-function based on a dict.

由于我不知道这两段中的哪一部分符合您的实际需求,因此我目前不提供很多代码,但是(当我稍后回到家再次检查SO时,-) '将很高兴根据您对问题的编辑进行编辑,以根据需要添加代码示例(比对此答案的注释更有用;-).

I'm not going to offer a lot of code at this time since I don't know which of the two paragraphs applies to your actual needs, but (when I later come back home and check SO again;-) I'll be glad to edit to add a code example as necessary depending on your edits to your question (more useful than comments to this answer;-).

编辑:OP在评论中说他想要一个更笼统的"答案(没有弄清楚这是什么意思),然后在Q的编辑中他说他想研究权衡" 全部之间的各个代码段之间都使用单字符子字符串(并检查其存在,而不是按最初的请求进行替换-当然是完全不同的语义).

Edit: in a comment the OP says he wants a "more general" answer (without clarifying what that means) then in an edit of his Q he says he wants to study the "tradeoffs" between various snippets all of which use single-character substrings (and check presence thereof, rather than replacing as originally requested -- completely different semantics, of course).

鉴于这种彻底的混乱,我只能说权衡取舍"(从性能角度考虑),我喜欢使用python -mtimeit -s'setup things here' 'statements to check'(确保要检查的语句没有副作用,以避免扭曲时间测量,因为timeit隐式循环以提供准确的时序测量结果.

Given this utter and complete confusion all I can say is that to "check tradeoffs" (performance-wise) I like to use python -mtimeit -s'setup things here' 'statements to check' (making sure the statements to check have no side effects to avoid distorting the time measurements, since timeit implicitly loops to provide accurate timing measurements).

一个一般性的答案(没有任何权衡,并且涉及多个字符的子字符串,因此完全与他的Q的编辑相反,但与他的评论相符-这两个完全矛盾,当然都见面):

A general answer (without any tradeoffs, and involving multiple-character substrings, so completely contrary to his Q's edit but consonant to his comments -- the two being entirely contradictory it is of course impossible to meet both):

import re

class Replacer(object):

  def __init__(self, **replacements):
    self.replacements = replacements
    self.locator = re.compile('|'.join(re.escape(s) for s in replacements))

  def _doreplace(self, mo):
    return self.replacements[mo.group()]

  def replace(self, s):
    return self.locator.sub(self._doreplace, s)

示例用法:

r = Replacer(zap='zop', zip='zup')
print r.replace('allazapollezipzapzippopzip')

如果某些要替换的子字符串是Python关键字,则需要以不太直接的方式将它们传递给tad,例如以下内容:

If some of the substrings to be replaced are Python keywords, they need to be passed in a tad less directly, e.g., the following:

r = Replacer(abc='xyz', def='yyt', ghi='zzq')

将会失败,因为def是关键字,因此您需要例如:

would fail because def is a keyword, so you need e.g.:

r = Replacer(abc='xyz', ghi='zzq', **{'def': 'yyt'})

之类的.

我发现这对于类(而不是过程编程)来说是一个很好的用法,因为RE定位要替换的子字符串,表达用其替换内容的dict以及执行替换的方法,实际上是保持在一起",而类实例正是在Python中执行这种保持在一起"的正确方法.一个闭包工厂也可以工作(因为replace方法实际上是实例的唯一部分,需要在外部"可见),但是它的方式可能不太清晰,更难调试:

I find this a good use for a class (rather than procedural programming) because the RE to locate the substrings to replace, the dict expressing what to replace them with, and the method performing the replacement, really cry out to be "kept all together", and a class instance is just the right way to perform such a "keeping together" in Python. A closure factory would also work (since the replace method is really the only part of the instance that needs to be visible "outside") but in a possibly less-clear, harder to debug way:

def make_replacer(**replacements):
  locator = re.compile('|'.join(re.escape(s) for s in replacements))

  def _doreplace(mo):
    return replacements[mo.group()]

  def replace(s):
    return locator.sub(_doreplace, s)

  return replace

r = make_replacer(zap='zop', zip='zup')
print r('allazapollezipzapzippopzip')

唯一的真正优势可能是访问自由变量"(timeit进行检查,因为基准案例"被认为对使用它的应用程序具有重要意义并具有代表性). >,locator_doreplace)在这种情况下可能比在普通的基于类的方法中访问限定名称(self.replacements等)的速度稍快(是否如此,将取决于Python中的实现).使用,则需要在重要基准上与timeit核对!).

The only real advantage might be a very modestly better performance (needs to be checked with timeit on "benchmark cases" considered significant and representative for the app using it) as the access to the "free variables" (replacements, locator, _doreplace) in this case might be minutely faster than access to the qualified names (self.replacements etc) in the normal, class-based approach (whether this is the case will depend on the Python implementation in use, whence the need to check with timeit on significant benchmarks!).

这篇关于在Python中有效地执行多个字符串替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆