Python3.x:在非常大的字符串中替换字符的最快方法 [英] Python3.x: quickest way to replace characters in very large string

查看:60
本文介绍了Python3.x:在非常大的字符串中替换字符的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下使用 Python3.x 的超大字符串,大小为几 GB,长度为 +100 亿个字符:

string1 = "XYZYXZZXYZZXYZYXYXZYXZYXZYZYZXY.....YY"

考虑到它的长度,这已经需要 +GB 才能加载到 RAM 中.

我想写一个函数,将每个 X 替换为 AY 替换为 B,和 ZC.我的目标是尽快完成这项工作.当然,这也应该是有效的(例如,可能会有一些我不确定的 RAM 权衡).

对我来说最明显的解决方案是使用 string 模块和 string.replace():

导入字符串def replace_characters(input_string):new_string = input_string.replace("X", "A").replace("Y", "B").replace("Z", "C")返回新字符串foo = replace_characters(string1)打印(富)

哪个输出

'ABCBACCABCCABCBABACBACBACBCBCAB...BB'

我担心这不是最有效的方法,因为我在如此大的数据结构上同时调用三个函数.

对于这么大的字符串,最有效的解决方案是什么?

解决方案

使用 str.translate.

<预><代码>>>>string1 = "XYZYXZZXYZZXYZYXYXZYXZYXZYZYZXY">>>string1.translate({ord("X"): "A", ord("Y"): "B", ord("Z"): "C"})'ABCBACCABCCABCBABACBACBACBCBCAB'

这将只分配一个(在您的情况下特别大)字符串.

Let's say I have the following extremely large string using Python3.x, several GB in size and +10 billion characters in length:

string1 = "XYZYXZZXYZZXYZYXYXZYXZYXZYZYZXY.....YY"

Given its length, this already takes +GB to load into RAM.

I would like to write a function that will replace every X with A, Y with B, and Z with C. My goal is to make this as quick as possible. Naturally, this should be efficient as well (e.g. there may be some RAM trade-offs I'm not sure about).

The most obvious solution for me is to use the string module and string.replace():

import string
def replace_characters(input_string):
    new_string = input_string.replace("X", "A").replace("Y", "B").replace("Z", "C")
    return new_string

foo = replace_characters(string1)
print(foo)

which outputs

'ABCBACCABCCABCBABACBACBACBCBCAB...BB'

I worry this is not the most efficient approach, as I'm simultaneously calling three functions at once on such a large data structure.

What is the most efficient solution for a string this large?

解决方案

A more memory efficient method, that will not generate so many temporary strings along the way, would be to use str.translate.

>>> string1 = "XYZYXZZXYZZXYZYXYXZYXZYXZYZYZXY"
>>> string1.translate({ord("X"): "A", ord("Y"): "B", ord("Z"): "C"})
'ABCBACCABCCABCBABACBACBACBCBCAB'

This will allocate just one (extra large in your case) string.

这篇关于Python3.x:在非常大的字符串中替换字符的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆