确定是否在目录中添加,删除或修改了任何文件 [英] Determine whether any files have been added, removed, or modified in a directory

查看:96
本文介绍了确定是否在目录中添加,删除或修改了任何文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个Python脚本,该脚本将获取目录中所有文件的md5sum(在Linux中).我相信我已经在下面的代码中完成了.

I'm trying to write a Python script that will get the md5sum of all files in a directory (in Linux). Which I believe I have done in the code below.

我希望能够运行此命令以确保目录中没有文件已更改,并且没有添加要删除的文件.

I want to be able to run this to make sure no files within the directory have changed, and no files have been added for deleted.

问题是,如果我对目录中的文件进行了更改,然后又将其更改回了.我从运行下面的函数中得到了不同的结果. (即使我将修改后的文件改回来.

The problem is if I make a change to a file in the directory but then change it back. I get a different result from running the function below. (Even though I changed the modified file back.

任何人都可以解释这一点.让我知道您是否可以考虑解决方法?

Can anyone explain this. And let me know if you can think of a work-around?

def get_dir_md5(dir_path):
    """Build a tar file of the directory and return its md5 sum"""
    temp_tar_path = 'tests.tar'
    t = tarfile.TarFile(temp_tar_path,mode='w')  
    t.add(dir_path)
    t.close()

    m = hashlib.md5()
    m.update(open(temp_tar_path,'rb').read())
    ret_str = m.hexdigest()

    #delete tar file
    os.remove(temp_tar_path)
    return ret_str

修改: 正如这些优秀人士所回答的那样,tar似乎包含标头信息(如修改日期).使用zip会以其他方式或其他格式工作吗?

As these fine folks have answered, it looks like tar includes header information like date modified. Would using zip work any differently or another format?

还有其他解决方法吗?

推荐答案

正如提到的其他答案一样,由于tar元数据更改或文件顺序更改,即使内容相同,两个tar文件也可以不同.您应该直接对文件数据运行校验和,对目录列表进行排序以确保它们始终保持相同顺序.如果要在校验和中包含一些元数据,请手动将其包括在内.

As the other answers mentioned, two tar files can be different even if the contents are the same either due to tar metadata changes or to file order changes. You should run the checksum on the file data directly, sorting the directory lists to ensure they are always in the same order. If you want to include some metadata in the checksum, include it manually.

使用os.walk的未经测试的示例:

Untested example using os.walk:

import os
import os.path
def get_dir_md5(dir_root):
    """Build a tar file of the directory and return its md5 sum"""

    hash = hashlib.md5()
    for dirpath, dirnames, filenames in os.walk(dir_root, topdown=True):

        dirnames.sort(key=os.path.normcase)
        filenames.sort(key=os.path.normcase)

        for filename in filenames:
            filepath = os.path.join(dirpath, filename)

            # If some metadata is required, add it to the checksum

            # 1) filename (good idea)
            # hash.update(os.path.normcase(os.path.relpath(filepath, dir_root))

            # 2) mtime (possibly a bad idea)
            # st = os.stat(filepath)
            # hash.update(struct.pack('d', st.st_mtime))

            # 3) size (good idea perhaps)
            # hash.update(bytes(st.st_size))

            f = open(filepath, 'rb')
            for chunk in iter(lambda: f.read(65536), b''):
                hash.update(chunk)

    return hash.hexdigest()

这篇关于确定是否在目录中添加,删除或修改了任何文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆