使用Python在大型文本文件中查找和替换特定行的最快方法 [英] Fastest way to find and replace specific line in a large text file with Python

查看:89
本文介绍了使用Python在大型文本文件中查找和替换特定行的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 numbers.txt 文件,其中包含几条100K行,每行由两个唯一的数字组成,并用:符号分隔:

I have a numbers.txt file that consists of several 100K lines, each one made up of two unique digits separated with : sign:

407597693:1604722326.2426915
510905857:1604722326.2696202
76792361:1604722331.120079
112854912:1604722333.4496727
470822611:1604722335.283259

我的目标是找到一条左侧为数字 407597693 的行,然后通过向其添加 3600 来更改右侧的数字.之后,我必须用所有更改重写 numbers.txt 文件.我必须尽快对同一个txt文件执行相同(只是数字不同)的操作.

My goal is to locate a line with the number 407597693 on the left side and then proceed to change the number on the right side by adding 3600 to it. After that, I have to rewrite the numbers.txt file with all the changes. I must perform the same (just different number) operation on the same txt file as fast as possible.

我已经设法通过打开每个文件行的 for循环来使它工作:搜索所需的编号,修改该行,然后重写整个文件.但是,我注意到不断执行这样的操作确实会花费一些时间,大约需要0.2-0.5秒,这会随着时间的流逝而累加,并且会大大降低所有速度.

I have managed to make it work via with open: file operations and for loop for each line, searching for the needed number, modifying the line, and then rewriting the whole file. However, I've noticed that constantly performing such an operation does take some time for my program, about 0.2-0.5 sec, which adds up over time and slows everything down considerably.

这是我正在使用的代码:

Here is the code I am using:

number = 407597693

with open("numbers.txt", "r+") as library:
                file = library.read()
            if (str(number) + ":") in file:
                lines = file.splitlines()
                with open("numbers_temp.txt", "a+") as library_temp:
                    for line in lines:
                        if (str(number) + ":") in line:
                            library_temp.write(
                                "\n" + str(number) + ":" + str(time.time() + 3600)
                            )
                        else:
                            library_temp.write("\n" + line)

                    library_temp.seek(0)
                    new_file = library_temp.read()

                    with open("numbers.txt", "w+") as library_2:
                        library_2.write(new_file)

                os.remove("numbers_temp.txt")

非常感谢您提供有关如何加快此过程的投入,在此先感谢您!

I would really appreciate any input on how to speed up this process, many thanks in advance!

推荐答案

我假设您的内存可以存储整个文件.使用正则表达式应该可以更快:

I assume your memory can store the whole file. This should be faster by using regex:

import re
number = 407597693
with open("numbers.txt", "r") as f:
    data = f.read()
    # data = re.sub(f'({number}):(.*)', lambda x:f"{x.group(1)}:{float(x.group(2))+3600}", data)
    data = re.sub("^" + str(number) + ".*\n", str(number) + ":" + str(int(time.time()) + 3600) + "\n", data, flags=re.MULTILINE)
with open("numbers.txt", "w") as f:
    f.write(data)

这篇关于使用Python在大型文本文件中查找和替换特定行的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆