我可以直接将一棵树读入工作目录: [英] Can I read a tree directly into a working directory going over the index

查看:94
本文介绍了我可以直接将一棵树读入工作目录:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索git内部。我想知道是否有一个GIT命令可以让我在不使用 index 的情况下直接将树读入工作树。例如,我创建了一棵树:

  $ echo'f1 content'| git hash-object -w --stdin 
a1deaae8f9ac984a5bfd0e8eecfbafaf4a90a3d0

$ echo'f2 content'| git hash-object -w --stdin
9b96e21cb748285ebec53daec4afb2bdcb9a360a

$ printf'%s%s%s\t%s \'''''
> 100644 blob a1deaae8f9ac984a5bfd0e8eecfbafaf4a90a3d0 f1.txt \
> 100644 blob 9b96e21cb748285ebec53daec4afb2bdcb9a360a f2.txt |
> git mktree
e05d9daa03229f7a7f6456d3d091d0e685e6a9db

现在我想读取将两个文件 f1.txt f2.txt 直接放到工作目录中。我知道我可以使用下面的组合:

  $ git读取树e05d9daa03229f7a7f6456d3d091d0e685e6a9db 
$ git checkout-index -a

但我想知道是否有一个命令可以做到这一点。

解决方案

简短回答是否:读取完整树的所有Git操作都会将其写入索引。



索引相比, 索引是您的主要逃生方式,使长答案成为合格的是。您可以通过使用 索引来避免使用 索引,如某些替代索引而不是the指数。通过将替代索引的路径名放入环境变量 GIT_INDEX_FILE 中,可以使其他索引代替索引。而在某些情况下,您可以完全绕过索引,通过...阅读。 : - )



我认为有两个主要原因,Git希望在将文件复制到一个文件之前将一系列树从提交读入索引中工作树。第一个与解决完整路径名有关:在一个树对象中,在一个Git仓库中,每个存储的子对象子树或blob都有一个模式( 40000 为一个子树), 1 一个散列和一个名字。尽管名称不是 full ,但它只是名称​​组件部分例如 foo / bar / baz.txt

通过每棵树线性提取,每个子树上递归-tree,Git可以建立一个索引,其中索引中存储的每个名称都是完整的路径名。也就是说,我们用伪代码启动树抽取:

  build_index('',top_level_tree_hash)

其中 build_index 是否这样(在伪Python中):

  def build_index(path_so_far,tree_hash):
tree = get_object('tree',tree_hash)
如果模式== MODE_TREE:
build_index(path_so_far + name +'/',hash)
else:
cache_this_object(path_so_far + name) ,mode,hash)

当递归完成时,索引的缓存方面包含所有为每个非树对象提供完整的路径名,模式和哈希,并且准备提取。



如果没有索引,如果你只有一棵树要阅读,您不知道到目前为止,前导路径名称组件应该是什么。我们需要上面的递归来维护我们的路径名。



Git想要读入索引的第二个原因与最终结果有关,行和过滤(污迹和干净过滤器)处理,这是在代表文件的blob对象上完成的。 (表示符号链接和连接的Blob对象既不需要EOL hackery也不需要涂抹过滤)。Git通常会将此处理推迟到文件从索引复制到存储库的时间点。此时,Git具有文件的完整路径名称(因为它以这种方式存储在索引 2 中)和哈希ID。它会查找适当的EOL或在适当的 .gitattributes 文件,工作树和/或索引和/或全局文件中进行筛选。工作树文件(如果存在的话)会覆盖只有索引的文件,而属性文件更靠近本地的文件将覆盖目录层次结构中较高的位置,这又更容易实现 如果Git拥有完整的索引和工作树,就像这样。它可以轻松地找到正确的EOL和过滤器属性,并将它们应用到从索引存储的散列复制到工作位置树中的blob内容(由索引存储的路径名确定)。



所有这些的结果是,为了提取文件简单的方式,Git需要一个索引,它在命令持续时间内至少运行 - 作为 索引。但是,如果您有一个特定的文件,其路径名称事先知道,并且愿意承担EOL风险/过滤一点(或完全放弃它们),您可以使用 git cat-file -p git show 来提取blob内容:

  git cat-file -p [--textconv | --filters] $ commithash:$ fullpath 

例如。当使用 - textconv - filters 时,您必须提供路径,所以如果你拥有的只是一个原始散列,你必须使用:

  git cat-file -p $ filteropt --path = $ path $ rawhash 

(其中 $ filteropt 是一个 - textconv - filters 选项)。

如果您希望内容未经过滤,则上述警告均不适用。您应该省略 - textconv - filters ,现在 git cat-file - p 完全不需要路径名。任何可以被 git rev-parse 所接受的定位blob对象的东西都可以,并且:

  git cat-file -p $ hash> $ path 

足以提取原始blob内容,将它们写入 $ path






1 存储库对象的 / em>被模式所隐含,并且稍后与底层存储库对象的实际类型进行匹配。如果我们忽略符号链接和gitlinks,只有两个blob / file模式( 100644 100755 )和一个子模式树模式( 40000 )。一个符号链接或gitlink也是由一个blob对象来表示的,所以如果模式是 40000 ,我们递归并获取另一个树对象,否则我们打了一个叶子并写入散列,其中最好的代表blob,进入缓存。



2 索引中的路径名得到压缩,所以这不完全正确。不过,有几种索引格式,所以它特别复杂。最好将每个索引/高速缓存条目视为代表< full_path_name,flags,hash_ID,cache_statistics>元组。

I'm exploring git internals. And I'm wondering if there's a GIT command that can allow me to read tree into a working tree directly without using index. For example I've created a tree:

$ echo 'f1 content' | git hash-object -w --stdin
a1deaae8f9ac984a5bfd0e8eecfbafaf4a90a3d0

$ echo 'f2 content' | git hash-object -w --stdin
9b96e21cb748285ebec53daec4afb2bdcb9a360a

$ printf '%s %s %s\t%s\n' \
> 100644 blob a1deaae8f9ac984a5bfd0e8eecfbafaf4a90a3d0 f1.txt \
> 100644 blob 9b96e21cb748285ebec53daec4afb2bdcb9a360a f2.txt |
> git mktree
e05d9daa03229f7a7f6456d3d091d0e685e6a9db

And now I want to read the tree e05d9daa03229f7a7f6456d3d091d0e685e6a9db with two files f1.txt and f2.txt directly into a working directory. I know I can use the following combo:

$ git read-tree e05d9daa03229f7a7f6456d3d091d0e685e6a9db
$ git checkout-index -a

But I'm wondering if there's a single command to do that.

解决方案

The short answer is "no": all Git operations that read a complete tree do so into an index.

The phrase an index, as opposed to the index, is your main escape hatch that makes the long answer a qualified "yes". You can avoid using the index by using an index—as in, some alternative index instead of "the" index. You make this other index take the place of "the" index by putting the alternative index's path-name in the environment variable GIT_INDEX_FILE. And in some cases, you can bypass the index entirely, by ... well, read on. :-)

There are, I think, two main reasons that Git "wants" to read a series of trees from a commit into an index, before copying files to a work-tree. The first one has to do with resolving full path names: within a tree object, inside a Git repository, each stored sub-object—sub-tree or blob—has a mode (which is 40000 for a sub-tree),1 a hash, and a name. The name is not the full path name of the object, though: it's just the name component, the bar part of foo/bar/baz.txt for instance.

By extracting linearly through each tree, recursing on each sub-tree, Git can build up an index in which each name stored in the index is a full path name. That is, we kick off the tree extraction with, in pseudo-code:

build_index('', top_level_tree_hash)

where build_index does this (in pseudo-Python):

def build_index(path_so_far, tree_hash):
    tree = get_object('tree', tree_hash)
    for mode, hash, name in tree:
        if mode == MODE_TREE:
            build_index(path_so_far + name + '/', hash)
        else:
            cache_this_object(path_so_far + name, mode, hash)

When the recursion finishes, the cache aspect of the index has in it all of the full path names, modes, and hashes for each non-tree object, and is ready to be extracted.

Without the index, though, if you just have a tree to read, you have no idea what the leading path-name components up to this point should be. We need the recursion above to maintain our path-names for us.

The second reason Git "wants" to read into an index has to do with the end-of-line and filtering (smudge and clean filter) processing that is done on blob objects representing files. (Blob objects representing symlinks and gitlinks need neither EOL hackery nor smudge filtering.) Git normally defers this processing to the point where the file is copied from the index to the repository. At this point, Git has the full path name of the file (because it's stored that way in the index2) and the hash ID. It looks up the appropriate EOL or filtering in the appropriate .gitattributes file(s), in the work-tree and/or index and/or globally. Work-tree files, if present, override index-only files, and attribute files "more local" to the directory holding the file override those higher up in the directory hierarchy, which again is much easier to achieve if Git has the entire index and work-tree in place as it does this. It can find the correct EOL and filter attributes easily, and apply them to the blob contents during the copy from index-stored hash, to location-in-work-tree as determined by index-stored path name.

The upshot of all of this is that to extract files "the easy way", Git needs an index, which—for the duration of the command it's running at least—acts as the index. But if you have one particular file whose path name you know in advance, and are willing to risk EOL/filtering a bit (or forego them entirely), you can use git cat-file -p or git show to extract the blob contents:

git cat-file -p [--textconv | --filters] $commithash:$fullpath

for instance. When using --textconv or --filters, you must provide a path, so if all you have is a raw hash you must use:

git cat-file -p $filteropt --path=$path $rawhash

(where $filteropt is one of the above --textconv or --filters options).

If you want the contents unfiltered, none of the above caveats apply at all. You should omit --textconv or --filters, and now git cat-file -p does not need a path name at all. Anything acceptable to git rev-parse that locates a blob object suffices, and:

git cat-file -p $hash > $path

suffices to extract the raw blob contents, writing them to $path.


1The repository object's type is implied by the mode and is later matched against the underlying repository object's actual type. If we ignore symlinks and gitlinks, there are just two blob/file modes (100644, 100755) and one sub-tree mode (40000). A symlink or a gitlink is also represented by a blob object, so if the mode is 40000 we recurse and fetch another tree object, otherwise we have hit a leaf and write the hash, which had best represent a blob, into the cache.

2Path names in the index get compressed, so this is not entirely true. There are several index formats, though, so it's particularly complicated. It's best to think of each index/cache entry as representing a <full_path_name, flags, hash_ID, cache_statistics> tuple.

这篇关于我可以直接将一棵树读入工作目录:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆