如何使用Python Generator区分两个文件 [英] How to diff the two files using Python Generator

查看:58
本文介绍了如何使用Python Generator区分两个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个100GB的文件,其中有1到1000000000000之间用换行符分隔.在此缺少一些行,例如5、11、19919等.我的Ram大小是8GB.

I have one file of 100GB having 1 to 1000000000000 separated by new line. In this some lines are missing like 5, 11, 19919 etc. My Ram size is 8GB.

如何找到缺失的元素.

我的想法是获取另一个文件for i in range(1,1000000000000)使用生成器逐行读取行.我们可以为此使用 yield 语句

My idea take another file for i in range(1,1000000000000) read the lines one by one using the generator. can we use yield statement for this

可以帮助编写代码

我的代码,下面的代码作为清单列出,下面的代码可以用于生产吗?

My Code, the below code taking as a list in does the below code can use it for production.?

def difference(a,b):
    with open(a,'r') as f:
        aunique=set(f.readlines())


    with open(b,'r') as f:
        bunique=set(f.readlines())

    with open('c','a+') as f:
        for line in list(bunique - aunique):
            f.write(line)

推荐答案

您可以遍历range生成的所有数字,并将该数字与文件中的下一个数字进行比较.输出缺少的数字,或阅读下一个匹配的下一个数字:

You can iterate over all the numbers generated by range and keep comparing the number to the next number in the file. Output the number if it's missing, or read the next number for the next match:

with open('numbers') as f:
    next_number = 0
    for n in range(1000000000001):
        if n == next_number:
            next_number = int(next(f, 0))
        else:
            print(n)

演示(假设目标数字从1到10): https://repl.it/repls/WaterloggedUntimelyCoding

Demo (assuming target numbers from 1 to 10 instead): https://repl.it/repls/WaterloggedUntimelyCoding

这篇关于如何使用Python Generator区分两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆