搜索和替换大字符串的最快 Python 方法 [英] Fastest Python method for search and replace on a large string

查看:34
本文介绍了搜索和替换大字符串的最快 Python 方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找在一个非常大的字符串中替换大量子字符串的最快方法.这是我用过的两个例子.

I'm looking for the fastest way to replace a large number of sub-strings inside a very large string. Here are two examples I've used.

findall() 感觉更简单、更优雅,但它需要惊人的时间.

findall() feels simpler and more elegant, but it takes an astounding amount of time.

finditer() 遍历一个大文件,但我不确定这是正确的方法.

finditer() blazes through a large file, but I'm not sure this is the right way to do it.

这是一些示例代码.请注意,我感兴趣的实际文本是大小约为 10MB 的单个字符串,这两种方法存在巨大差异.

Here's some sample code. Note that the actual text I'm interested in is a single string around 10MB in size, and there's a huge difference in these two methods.

import re

def findall_replace(text, reg, rep):
    for match in reg.findall(text):
        output = text.replace(match, rep)
    return output

def finditer_replace(text, reg, rep):
    cursor_pos = 0
    output = ''
    for match in reg.finditer(text):
        output += "".join([text[cursor_pos:match.start(1)], rep])
        cursor_pos = match.end(1)
    output += "".join([text[cursor_pos:]])
    return output

reg = re.compile(r'(dog)')
rep = 'cat'
text = 'dog cat dog cat dog cat'

finditer_replace(text, reg, rep)

findall_replace(text, reg, rep)

UPDATE 向测试添加了 re.sub 方法:

UPDATE Added re.sub method to tests:

def sub_replace(reg, rep, text):
    output = re.sub(reg, rep, text)
    return output

结果

re.sub() - 0:00:00.031000
finditer() - 0:00:00.109000
findall() - 0:01:17.260000

re.sub() - 0:00:00.031000
finditer() - 0:00:00.109000
findall() - 0:01:17.260000

推荐答案

标准方法是使用内置的

re.sub(reg, rep, text)

顺便说一句,您的版本之间性能差异的原因是您的第一个版本中的每次替换都会导致整个字符串被重新复制.复制速度很快,但是当您一次复制 10 MB 时,足够多的副本会变慢.

Incidentally the reason for the performance difference between your versions is that each replacement in your first version causes the entire string to be recopied. Copies are fast, but when you're copying 10 MB at a go, enough copies will become slow.

这篇关于搜索和替换大字符串的最快 Python 方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆