快速列出大量文件的上次提交日期 [英] List last commit dates for a large number of files, quickly

查看:104
本文介绍了快速列出大量文件的上次提交日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想列出git仓库中大量文件的最后提交日期



为了具体性,让我们假设我想获取特定子目录内所有 *。txt 文件的最后提交日期。总共有数以万计的文件存在,相关的 *。txt 文件的数量大概在几百个。已经有数千次提交。



我试过三种不同的方法。




解决方案1。 给出了一个基于 git log 的答案。但是,如果我尝试做这样的事情,速度很慢:

  find。 -name'* .txt'| 
xargs -n1 git log --format = format:%ai -n1 --all - '{}'

在我的测试案例中,花了几分钟时间 - 对于我的目的而言太慢了。




解决方案2

  git log --format = format:%ai --name-only。 

然而,我必须编写一个后处理输出的脚本。此外,上面的命令会输出很多不需要的信息:无关文件和旧提交。

b

解决方案3 我也试过类似的东西,为了摆脱不相关的文件:

  git log --format = format:%ai --name-only`find。 -name'* .txt'` 

然而,结果是




问题。我错过了什么吗?有没有一种快捷方便的方法?最好的东西,不仅现在,而且在将来,当我们有更多的提交数量的时候呢?

解决方案

试试这个。在git中,每个提交都引用了一个树对象,它具有指向每个文件状态的指针(文件是 )。

所以,你想要做的是编写一个程序,首先列出你感兴趣的所有文件,然后从 HEAD 对象(通过 git rev-parse HEAD 获得的SHA1提交)开始。它会检查是否有任何感兴趣的文件在该树中被修改(树从 git cat-file commit [SHA1] 的tree属性中获得)请注意,您必须下降到每个目录的子树。如果它们被修改(意味着它们在上一个修订版中具有不同的SHA1哈希值),它将从利息集中移除每一个并打印适当的信息。然后继续到当前树的每个父代。这一直持续到利益集合为空。



如果您想要最高速度,您将使用git C API。如果你不想 那么你可以使用 git猫文件树[SHA1 hash] (或者更简单, git ls-tree [SHA1 hash] [files] ),它将执行绝对最小量的工作来读取特定的树对象(它是管道层的一部分)。



这个问题在未来会继续发挥多大的作用是值得怀疑的,但如果forward-compat是一个更大的问题,那么您可以从 git cat-file - 但正如您已经发现的那样, git log 相对较慢,因为它是瓷器的一部分,而不是管道。



请参阅此处 a>是一个很好的关于git对象模型如何工作的资源。


I would like to list the last commit date for a large number of files in a git repository.

For the sake of concreteness, let us assume that I want to get the last commit dates of all *.txt files inside a particular subdirectory. There are tens of thousands of files in the repository in total, and the number of relevant *.txt files is in the ballpark of several hundreds. There are already thousands of commits in the repository.

I tried three different approaches.


Solution 1. This question gives one answer, based on git log. However, if I try to do something like this, it is very slow:

find . -name '*.txt' |
    xargs -n1 git log --format=format:%ai -n1 --all -- '{}'

In my test case, it took several minutes – far too slow for my purposes.


Solution 2. Something like this would be much faster, less than one second:

git log --format=format:%ai --name-only .

However, then I would have to write a script that post-processes the output. Moreover, the above command prints out lots of information that is never needed: irrelevant files and old commits.


Solution 3. I also tried something like this, in order to get rid of the irrelevant files:

git log --format=format:%ai --name-only `find . -name '*.txt'`

However, this turned out to be slower than solution 2. (There was a factor 3 difference in the running time.) Moreover, it still prints old commits that are no longer needed.


Question. Am I missing something? Is there a fast and convenient approach? Preferably something that works not only right now but also in future, when we have a much larger number of commits?

解决方案

Try this.

In git, each commit references a tree object which has pointers to the state of each file (the files being blob objects).

So, what you want to do is write a program which starts out with a list of all the files in which you're interested, and begins at the HEAD object (SHA1 commit obtained via git rev-parse HEAD). It checks to see if any of the "files of interest" are modified in that tree (tree gotten from "tree" attribute of git cat-file commit [SHA1]) - note, you'll have to descend to the subtrees for each directory. If they are modified (meaning a different SHA1 hash from the one they had in the "previous" revision), it removes each such from the interest set and prints the appropriate information. Then it continues to each parent of the current tree. This continues until the set-of-interest is empty.

If you want the maximal speed, you'll use the git C API. If you don't want that much speed, you can use git cat-file tree [SHA1 hash] (or, easier, git ls-tree [SHA1 hash] [files]), which is going to perform the absolute minimal amount of work to read a particular tree object (it's part of the plumbing layer).

It's questionable how well this will continue to work in the future, but if forward-compat is a bigger issue you can move up a level from git cat-file - but as you already discovered, git log is comparatively slow as it's part of the porcelain, not the plumbing.

See here for a pretty good resource on how git's object model works.

这篇关于快速列出大量文件的上次提交日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆