比较python中的多个文件 [英] Compare multiple file in python
问题描述
我有一组包含 n
个文件的目录,我需要比较每个文件(在一个目录内)并找出它们之间是否有任何区别.我试过 filecmp
和 difflib
但它们只支持两个文件.
i have a set of directories with n
number of files, I need to compare each of those files (within one directory) and find if there is any difference in them.
I tried filecmp
and difflib
but they only support two files.
我还能做些什么来比较/区分文件吗?
Is there anything else I can do to compare/diff the files ?
此文件包含主机名
--------------------------------
Example :- Dir -> Server.1
|-> file1
|-> file2
|-> file3
file1 <- host1
host2
host3
file2 <- host1
host2
host3
host4
file3 <- host1
host2
host3
推荐答案
我想我会分享如何将 md5 哈希与 os.path.walk() 相结合可以帮助您找出目录树中的所有重复项.目录和文件的数量越多,首先按大小对文件进行排序以排除因大小不同而无法复制的任何文件可能越有用.希望这会有所帮助.
I thought I'd share how combining the md5 hash compare with os.path.walk() can help you ferret out all the duplicates in a directory tree. The larger the number of directories and files gets, the more helpful it might be to first sort files by size to rule out any files that can't duplicates because they are of different size. Hope this helps.
import os, sys
from hashlib import md5
nonDirFiles = []
def crawler(arg, dirname, fnames):
'''Crawls directory 'dirname' and creates global
list of paths (nonDirFiles) that are files, not directories'''
d = os.getcwd()
os.chdir(dirname)
global nonDirFiles
for f in fnames:
if not os.path.isfile(f):
continue
else:
nonDirFiles.append(os.path.join(dirname, f))
os.chdir(d)
def startCrawl():
x = raw_input("Enter Dir: ")
print 'Scanning directory "%s"....' %x
os.path.walk(x, crawler, nonDirFiles)
def findDupes():
dupes = []
outFiles = []
hashes = {}
for fileName in nonDirFiles:
print 'Scanning file "%s"...' % fileName
f = file(fileName, 'r')
hasher = md5()
data = f.read()
hasher.update(data)
hashValue = hasher.digest()
if hashes.has_key(hashValue):
dupes.append(fileName)
else:
hashes[hashValue] = fileName
return dupes
if __name__ == "__main__":
startCrawl()
dupes = findDupes()
print "These files are duplicates:"
for d in dupes:print d
这篇关于比较python中的多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!