通过Python获取Git存储库文件的最后提交时间? [英] Get time of last commit for Git repository files via Python?

查看:213
本文介绍了通过Python获取Git存储库文件的最后提交时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数千个文件的Git存储库,并且想获取每个文件的最后一次提交的日期和时间.可以使用Python(例如,使用os.path.getmtime(path)之类的方法)完成此操作吗?

I have a Git repository with several thousand files, and would like to get the date and time of the last commit for each individual file. Can this be done using Python (e.g., by using something like os.path.getmtime(path))?

推荐答案

一个有趣的问题.下面是一个快速而肮脏的实现. 我已经使用multiprocessing.Pool.imap()启动子流程了,因为它很方便.

An interesting question. Below is a quick and dirty implementation. I've used multiprocessing.Pool.imap() to start subprocesses because it's convenient.

#!/usr/bin/env python
# vim:fileencoding=utf-8:ft=python
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# Last modified: 2015-05-24 12:28:45 +0200
#
# To the extent possible under law, Roland Smith has waived all
# copyright and related or neighboring rights to gitdates.py. This
# work is published from the Netherlands. See
# http://creativecommons.org/publicdomain/zero/1.0/

"""For each file in a directory managed by git, get the short hash and
data of the most recent commit of that file."""

from __future__ import print_function
from multiprocessing import Pool
import os
import subprocess
import sys
import time

# Suppres annoying command prompts on ms-windows.
startupinfo = None
if os.name == 'nt':
    startupinfo = subprocess.STARTUPINFO()
    startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW


def main():
    """
    Entry point for gitdates.
    """
    checkfor(['git', '--version'])
    # Get a list of all files
    allfiles = []
    # Get a list of excluded files.
    if '.git' not in os.listdir('.'):
        print('This directory is not managed by git.')
        sys.exit(0)
    exargs = ['git', 'ls-files', '-i', '-o', '--exclude-standard']
    exc = subprocess.check_output(exargs, startupinfo=startupinfo).split()
    for root, dirs, files in os.walk('.'):
        for d in ['.git', '__pycache__']:
            try:
                dirs.remove(d)
            except ValueError:
                pass
        tmp = [os.path.join(root, f) for f in files if f not in exc]
        allfiles += tmp
    # Gather the files' data using a Pool.
    p = Pool()
    filedata = [res for res in p.imap_unordered(filecheck, allfiles)
                if res is not None]
    p.close()
    # Sort the data (latest modified first) and print it
    filedata.sort(key=lambda a: a[2], reverse=True)
    dfmt = '%Y-%m-%d %H:%M:%S %Z'
    for name, tag, date in filedata:
        print('{}|{}|{}'.format(name, tag, time.strftime(dfmt, date)))


def checkfor(args, rv=0):
    """
    Make sure that a program necessary for using this script is available.
    Calls sys.exit when this is not the case.

    Arguments:
        args: String or list of strings of commands. A single string may
            not contain spaces.
        rv: Expected return value from evoking the command.
    """
    if isinstance(args, str):
        if ' ' in args:
            raise ValueError('no spaces in single command allowed')
        args = [args]
    try:
        with open(os.devnull, 'w') as bb:
            rc = subprocess.call(args, stdout=bb, stderr=bb,
                                 startupinfo=startupinfo)
        if rc != rv:
            raise OSError
    except OSError as oops:
        outs = "Required program '{}' not found: {}."
        print(outs.format(args[0], oops.strerror))
        sys.exit(1)


def filecheck(fname):
    """
    Start a git process to get file info. Return a string containing the
    filename, the abbreviated commit hash and the author date in ISO 8601
    format.

    Arguments:
        fname: Name of the file to check.

    Returns:
        A 3-tuple containing the file name, latest short hash and latest
        commit date.
    """
    args = ['git', '--no-pager', 'log', '-1', '--format=%h|%at', fname]
    try:
        b = subprocess.check_output(args, startupinfo=startupinfo)
        data = b.decode()[:-1]
        h, t = data.split('|')
        out = (fname[2:], h, time.gmtime(float(t)))
    except (subprocess.CalledProcessError, ValueError):
        return None
    return out


if __name__ == '__main__':
    main()

示例输出:

serve-git|8d92934|2012-08-31 21:21:38 +0200
setres|8d92934|2012-08-31 21:21:38 +0200
mydec|e711e27|2008-04-09 21:26:05 +0200
sync-iaudio|8d92934|2012-08-31 21:21:38 +0200
tarenc|8d92934|2012-08-31 21:21:38 +0200
keypress.sh|a5c0fb5|2009-09-29 00:00:51 +0200
tolower|8d92934|2012-08-31 21:21:38 +0200

编辑:已更新为使用 os.devnull (也适用于ms-windows)而不是/dev/null.

Edit2 :使用startupinfo禁止在ms-windows上弹出命令提示符.

Edit2: Used startupinfo to suppress command prompts popping up on ms-windows.

Edit3 :使用__future__使其与Python 2和3兼容.经过2.7.9和3.4.3测试.现在在github上可用.

Edit3: Used __future__ to make this compatible with both Python 2 and 3. Tested with 2.7.9 and 3.4.3. Now also available on github.

这篇关于通过Python获取Git存储库文件的最后提交时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆