smuge过滤器中`git describe`的评估时间 [英] Evaluation time for `git describe` in smuge filter

查看:129
本文介绍了smuge过滤器中`git describe`的评估时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在成功将旧的SVN仓库转换为Git后,使用 svn2git ,我的任务是重新生成 $ Revision $ 关键字扩展(或近似值)。

所以我...


  • 为SVN的 rev0添加了 svn-r 注释标记

  • $ .git / attributes 添加的b $ b


  •   * filter = revsion 


  • in .git / / code>添加

      [filterrevsion] 
    smudge = / bin / sed -e' s $ \\ G $ GIT_EXEC_PATH = / usr / lib / git-core / usr / bin / git describe --match svn-r)'$ / g'
    clean = / bin / sed -e's / \\ $ Revision:[^ $] * \\ $ / $ Revision $ / g'




...它能正常工作,但正在做错误的事情。

每当我结账时,它会展开 $ Re愿景$ git describe 之前的 HEAD 之前结帐)。因此,当我在 master〜1 上并执行 git checkout master 时。我扩展了 master〜1 ,而不是 master

为了确保早期评估并不是 $(...) .git / config 中的错误,我也试着移动这段代码到它自己的脚本,但无济于事。

因此,我的问题:有没有办法让 git describe smudge filter 运行以描述结账后的提交

解决方案

TL; DR:a(已测试)解决方案



试试这个post-checkout挂钩(现在已经过测试,尽管很轻松;我也把它放在GitHub的脚本库中):

#! / usr / bin / env python
$ b $
检出重新抹黑文件的钩子


from __future__ import print_function

导入集合
导入os
导入子流程
导入sys

def run(cmd):

运行命令并收集它的stdout,如果它产生任何stderr
或者退出非零,则执行la subprocess.check_call()。

proc = subprocess.Popen(cmd ,stdout = subprocess.PIPE)
stdout,_ = proc.communicate()
status = proc.wait()
如果status!= 0:
raise subprocess.CalledProcessError(状态,cmd)
返回标准输出

def git_ls_files(* args):

使用给定参数运行git ls-files,加上-z; break注意,在Py3k中,这个
将是一个字节串列表!

output = run(['git')
返回字节串到文件列表中。 ,'ls-files','-z'] + list(args))
#-z产生NUL终止,n ot NUL分隔:丢弃最后一个条目
return output.split(b'\0')[: - 1]

def recheckout(files):

强制Git从索引中重新提取给定的文件。
由于Git在工作树中存在
文件时坚持不做任何事,我们首先*删除*它们。

为避免炸毁非常长的参数列表,请一次执行这些
1000文件,或在
时间内执行最多10k字节的参数,以先发生者为准。请注意,我们可能会超过
10k的限制,因为任何文件的长度都很长,所以
这是一个马虎的限制,我们不需要非常准确。

files = collections.deque(files)
while file:
todo = [b'git',b'checkout',b' - ']
#应该加上1来解释os中参数之间的b'\0':
#argbytes = reduce(lambda x,y:x + len(y)+1,todo,0)$
argbytes = 0
文件和len(todo)< 1000和argbytes< 10000:
path = files.popleft()
todo.append(路径)
argbytes + = len(路径)+ 1
os.remove(路径)
#文件现在为空,或者todo已达到其限制:
#运行git checkout命令
运行(todo)

def warn_about(文件):

记下一些用户,文件没有被
重新检出,因为它们在工作树中被修改。

如果len(文件)== 0:
返回
print(注意:以下文件已被结转并可能)
print (不符合你期望的干净结账:)
#如果这是py3k,每个路径是一个字节,我们需要一个字符串
如果类型(b'')==类型(''):
printable = lambda path:path
else:
printable = lambda path:path.decode('unicode_escape')
用于文件路径:
print('\t {} \\\
'.format(printable(path)))

def main():

运行,通过git post-checkout钩子。我们得到三个非常简单的参数
,所以不需要argparse。

我们只想在以下情况下做些什么:
- 标志参数arg 3是1
- 其他两个参数不同

do是重新签出* unmodified *文件,以
强制他们重新运行任何已定义的.gitattributes
过滤器。

argv = sys.argv [1:]
如果len(argv)!= 3:
return'错误:必须用三个参数'
如果argv [2]!='1':
返回0
如果argv [0] == argv [1]:
返回0
allfiles = git_ls_files( )
modfiles = git_ls_files(' - m')
unmodified = set(allfiles) - set(modfiles)
recheckout(未修改)
warn_about(modfiles)
return 0
$ b $ if if __name__ =='__main__':
try:
除了KeyboardInterrupt外,还有一个sys.exit(main())

sys.exit '\\\
Interrupted')

为了提高性能,您可以修改它以仅在文件可能会使用 $ Revision $ (您的属性将此定义为所有文件,所以我在这里使用了它)。



Long



今天早上我想了一下这个问题,正如你所看到的那样,它只是 git checkout 在更改提交时填充索引和工作树时尚未更新 HEAD 引用。最终,试图计算将 HEAD设置为 git checkout EM>。您也可以使用 结账挂钩 em>



现在还不清楚这是否应该用代替涂抹过滤器,或者 >除了涂抹过滤器,但我认为之外是正确的。您几乎可以肯定仍然希望干净的过滤器照常运行。



在任何情况下,结账钩子都会得到:


...三个参数:前一个HEAD的引用,新HEAD的引用(可能或可能没有更改)以及一个标志,指示结帐是否是分支结帐(更改分支,标志= 1)或文件结帐(从索引中检索文件,标志= 0)。这个钩子不会影响 git checkout 的结果。


> git checkout 和/或这里的文档。最后一句说不会影响结果,但这在两个方面不是正确的:


  • 钩子的退出状态成为 git checkout 的退出状态,这使得checkout出现失败,如果钩子的退出状态
  • 钩子可以覆盖工作树文件。



这是最后一个)


它也在 git clone 之后运行,除非--no-结帐(-n)选项被使用。给钩子的第一个参数是null-ref,第二个是新HEAD的ref,并且该标志始终为1.同样,对于 worktree add ,除非使用--no-checkout。 / b>

该钩子可用于执行存储库有效性检查,自动显示与以前HEAD不同的区别,或设置工作目录元数据属性。


您的目标是在 HEAD 更新时运行涂抹过滤器。查看 builtin / checkout.c的源代码,我们发现对于更改提交的情况, git checkout 首先填充索引和工作树,然后更新 HEAD ref(第一个突出显示的行),然后运行带有两个散列ID的后结帐挂钩(第一个将是特殊的空散列一些情况下)并将标志设置为1。

文件签出根据定义不改变提交,运行标志设置为0的挂钩。两个哈希ID将始终匹配,这就是为什么标志测试几乎肯定是不必要的。



执行文件签出将重新运行涂抹过滤器。由于 HEAD 现已更新,因此 $ Revision $ 会以您想要的方式展开。这个明显的坏处在于每个工作树文件必须更新两次!还有一个问题,上面的Python代码通过删除所有未修改的文件,迫使 git checkout 从索引中重新提取它们到工作树。


After a successful conversion of an old SVN repository into Git, with svn2git, I have been tasked with reproducing the $Revision$ keyword expansion, (or a close approximation of it).

So I ...

  • added a svn-r annotated tag for SVN's rev0

  • in .git/attributes added

    * filter=revsion
    

  • in .git/configure added

    [filter "revsion"]
        smudge = /bin/sed -e 's/\\$Revision\\$/$Revision: '$(GIT_EXEC_PATH=/usr/lib/git-core/ /usr/bin/git describe --match svn-r)'$/g'
        clean = /bin/sed -e 's/\\$Revision: [^$]*\\$/$Revision$/g'
    

... and it works, but is doing the wrong thing.

Whenever I do a checkout, it expand the $Revision$ the git describe of the previous HEAD (before the checkout). So that when I am on master~1 and doing git checkout master. I get the expansion for master~1 and not for master.

Just to make sure that the early evaluation was not the fault of the $(...) in .git/config I also tried to move this code into its own script, but to no avail.

Hence my question: Is there a way to make git describe that runs by a smudge filter to describe the commit after the checkout?

解决方案

TL;DR: a (tested) solution

Try this post-checkout hook (now tested, albeit lightly; I put it in my scripts repository on GitHub as well):

#! /usr/bin/env python

"""
post-checkout hook to re-smudge files
"""

from __future__ import print_function

import collections
import os
import subprocess
import sys

def run(cmd):
    """
    Run command and collect its stdout.  If it produces any stderr
    or exits nonzero, die a la subprocess.check_call().
    """
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    stdout, _ = proc.communicate()
    status = proc.wait()
    if status != 0:
        raise subprocess.CalledProcessError(status, cmd)
    return stdout

def git_ls_files(*args):
    """
    Run git ls-files with given arguments, plus -z; break up
    returned byte string into list of files.  Note, in Py3k this
    will be a list of byte-strings!
    """
    output = run(['git', 'ls-files', '-z'] + list(args))
    # -z produces NUL termination, not NUL separation: discard last entry
    return output.split(b'\0')[:-1]

def recheckout(files):
    """
    Force Git to re-extract the given files from the index.
    Since Git insists on doing nothing when the files exist
    in the work-tree, we first *remove* them.

    To avoid blowing up on very long argument lists, do these
    1000 files at a time or up to 10k bytes of argument at a
    time, whichever occurs first.  Note that we may go over
    the 10k limit by the length of whatever file is long, so
    it's a sloppy limit and we don't need to be very accurate.
    """
    files = collections.deque(files)
    while files:
        todo = [b'git', b'checkout', b'--']
        # should add 1 to account for b'\0' between arguments in os exec:
        # argbytes = reduce(lambda x, y: x + len(y) + 1, todo, 0)
        # but let's just be very sloppy here
        argbytes = 0
        while files and len(todo) < 1000 and argbytes < 10000:
            path = files.popleft()
            todo.append(path)
            argbytes += len(path) + 1
            os.remove(path)
        # files is now empty, or todo has reached its limit:
        # run the git checkout command
        run(todo)

def warn_about(files):
    """
    Make a note to the user that some file(s) have not been
    re-checked-out as they are modified in the work-tree.
    """
    if len(files) == 0:
        return
    print("Note: the following files have been carried over and may")
    print("not match what you would expect for a clean checkout:")
    # If this is py3k, each path is a bytes and we need a string.
    if type(b'') == type(''):
        printable = lambda path: path
    else:
        printable = lambda path: path.decode('unicode_escape')
    for path in files:
        print('\t{}\n'.format(printable(path)))

def main():
    """
    Run, as called by git post-checkout hook.  We get three arguments
    that are very simple, so no need for argparse.

    We only want to do something when:
     - the flag argument, arg 3, is 1
     - the two other arguments differ

    What we do is re-checkout the *unmodified* files, to
    force them to re-run through any defined .gitattributes
    filter.
    """
    argv = sys.argv[1:]
    if len(argv) != 3:
        return 'error: hook must be called with three arguments'
    if argv[2] != '1':
        return 0
    if argv[0] == argv[1]:
        return 0
    allfiles = git_ls_files()
    modfiles = git_ls_files('-m')
    unmodified = set(allfiles) - set(modfiles)
    recheckout(unmodified)
    warn_about(modfiles)
    return 0

if __name__ == '__main__':
    try:
        sys.exit(main())
    except KeyboardInterrupt:
        sys.exit('\nInterrupted')

To improve performance, you can modify it to operate only on files that are likely to use $Revision$ (your attribute defines this as "all files" so I used that here).

Long

I thought about this problem a bit this morning. As you have observed, it is simply that git checkout has not yet updated the HEAD reference at the time it is populating the index and work-tree while changing commits. Ultimately, it seems too annoying to attempt to compute what git checkout is about to set HEAD to. You might instead use a post-checkout hook.

It's not clear yet whether this should be something to use instead of the smudge filter, or in addition to the smudge filter, but I think in addition to is correct. You almost certainly still want the clean filter to operate as usual.

In any case, a post-checkout hook gets:

... three parameters: the ref of the previous HEAD, the ref of the new HEAD (which may or may not have changed), and a flag indicating whether the checkout was a branch checkout (changing branches, flag=1) or a file checkout (retrieving a file from the index, flag=0). This hook cannot affect the outcome of git checkout.

(There is bug in git checkout and/or the documentation here. The last sentence says "cannot affect the outcome", but that's not true in two ways:

  • The exit status of the hook becomes the exit status of git checkout. This makes the checkout appear to have failed if the exit status of the hook is nonzero.
  • The hook can overwrite work-tree files.

It's the last that I intend to use here.)

It is also run after git clone, unless the --no-checkout (-n) option is used. The first parameter given to the hook is the null-ref, the second the ref of the new HEAD and the flag is always 1. Likewise for git worktree add unless --no-checkout is used.

This hook can be used to perform repository validity checks, auto-display differences from the previous HEAD if different, or set working dir metadata properties.

Your goal is to make the smudge filter run when HEAD is updated. Looking at the source code for builtin/checkout.c, we find that for the "change commits" case, git checkout first populates the index and work-tree, then updates the HEAD ref (first highlighted line), then runs the post-checkout hook with the two hash IDs (the first one will be the special null-hash in some cases) and the flag set to 1.

File checkouts, which by definition don't change commits, run the hook with the flag set to 0. The two hash IDs will always match, which is why the flag test is almost certainly unnecessary.

Doing the file checkouts will re-run the smudge filter. Since HEAD has now been updated, $Revision$ will expand the way you want. The obvious bad thing about this is that every work-tree file must be updated twice! There is another issue, which the Python code above works around by removing the supposedly-unmodified files, forcing git checkout to re-extract them from index to work-tree.

这篇关于smuge过滤器中`git describe`的评估时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆