比较python中的多个文件 [英] Compare multiple file in python

查看:32
本文介绍了比较python中的多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组包含 n 个文件的目录,我需要比较每个文件(在一个目录内)并找出它们之间是否有任何区别.我试过 filecmpdifflib 但它们只支持两个文件.

i have a set of directories with n number of files, I need to compare each of those files (within one directory) and find if there is any difference in them. I tried filecmp and difflib but they only support two files.

我还能做些什么来比较/区分文件吗?

Is there anything else I can do to compare/diff the files ?

此文件包含主机名

--------------------------------
Example :- Dir -> Server.1
                    |-> file1
                    |-> file2
                    |-> file3


file1 <- host1 
         host2
         host3

file2 <- host1 
         host2 
         host3 
         host4

file3 <- host1 
         host2 
         host3

推荐答案

我想我会分享如何将 md5 哈希与 os.path.walk() 相结合可以帮助您找出目录树中的所有重复项.目录和文件的数量越多,首先按大小对文件进行排序以排除因大小不同而无法复制的任何文件可能越有用.希望这会有所帮助.

I thought I'd share how combining the md5 hash compare with os.path.walk() can help you ferret out all the duplicates in a directory tree. The larger the number of directories and files gets, the more helpful it might be to first sort files by size to rule out any files that can't duplicates because they are of different size. Hope this helps.

import os, sys
from hashlib import md5

nonDirFiles = []

def crawler(arg, dirname, fnames):
    '''Crawls directory 'dirname' and creates global 
    list of paths (nonDirFiles) that are files, not directories'''
    d = os.getcwd()
    os.chdir(dirname)

    global nonDirFiles
    for f in fnames:
        if not os.path.isfile(f):

            continue
        else:       
            nonDirFiles.append(os.path.join(dirname, f))
    os.chdir(d)

def startCrawl():
    x = raw_input("Enter Dir: ")
    print 'Scanning directory "%s"....' %x
    os.path.walk(x, crawler, nonDirFiles)

def findDupes():
    dupes = []
    outFiles = []
    hashes = {}
    for fileName in nonDirFiles:
        print 'Scanning file "%s"...' % fileName
        f = file(fileName, 'r')
        hasher = md5()
        data = f.read()
        hasher.update(data)
        hashValue = hasher.digest()

        if hashes.has_key(hashValue):

            dupes.append(fileName)
        else:
            hashes[hashValue] = fileName

    return dupes

if __name__ == "__main__":
    startCrawl()
    dupes = findDupes()
    print "These files are duplicates:"
    for d in dupes:print d

这篇关于比较python中的多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆