在Python中比较两个大文件 [英] diff two big files in Python

查看:153
本文介绍了在Python中比较两个大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个大文本文件,每个文件近2GB。我需要类似 diff f1.txt f2.txt 的东西。有什么办法可以在python中快速完成此任务?标准 difflib 太慢。我认为有一种更快的方法,因为 difflib 是在Python中完全实现的。

I have two big text files, near 2GB each. I need something like diff f1.txt f2.txt . Is there any way to do this task fast in python? Standard difflib is too slow. I assume there is faster way, because difflib is fully implemented in Python.

推荐答案

如何通过脚本处理大文件的方式使用difflib?不要将文件加载到内存中,而要遍历文件中的文件并逐块进行比较。例如一次100条线路。

How about using difflib in way that you script can handle big files? Don't load the files in memory, but iterate through the files of the files and diff in chunks. For e.g 100 lines at a time.

import difflib

d = difflib.Differ()

f1 = open('bigfile1')
f2 = open('bigfile2')

b1 = []
b2 = []

for n, lines in enumerate(zip(f1,f2)):
    if not (n % 100 == 0):
        b1.append(lines[0])
        b2.append(lines[1])
    else:
        diff = d.compare("".join(b1), "".join(b2))
        b1 = []
        b2 = []
        print ''.join(list(diff))

diff = d.compare("".join(b1), "".join(b2))
print ''.join(list(diff))
f1.close()
f2.close()

这篇关于在Python中比较两个大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆