git复制文件,而不是`git mv` [英] git copy file, as opposed to `git mv`

查看:89
本文介绍了git复制文件,而不是`git mv`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我意识到git通过区分文件内容来工作.我有一些要复制的文件.为了绝对防止git混淆,是否有一些git命令可用于将文件复制到其他目录(不是mv,而是cp),以及也可以暂存文件?

解决方案

简短的回答是否".但是还有更多要知道的事.它只需要一些背景. (并且作为 JDB在评论中建议,为方便起见,我将提及为什么git mv存在.)

稍长一点:您说对了,Git会比较文件,但是何时 Git进行这些文件比较可能是错误的.

Git的内部存储模型建议,每个提交都是 all 提交中文件的独立快照.进入新提交的每个文件的版本,即该路径的快照中的数据,就是运行git commit. 1 <时该路径下的索引中的内容. /p>

第一层的实际实现是,将每个快照文件以压缩形式捕获为Git数据库中的 blob对象.除了一种特殊情况外,blob对象与该文件的每个先前版本和后续版本都非常独立:如果在 no 数据发生更改的情况下进行新提交,则将重新使用旧的斑点.因此,当您连续进行两次提交(每个提交包含100个文件,并且仅更改一个文件)时,第二次提交将重复使用99个先前的Blob,并且只需将一个实际文件快照到新的Blob中即可. 2

因此,Git会比较文件的事实根本不会参与提交.除了存储先前提交的哈希ID(并可能重新使用完全匹配的Blob)之外,没有提交依赖于先前的提交(但这是它们完全匹配的副作用,而不是在运行<时花哨的计算) c1>).

现在,所有这些独立的Blob对象最终都会占用大量空间. 此时,Git可以将对象打包"到.pack文件中.它将每个对象与某些选定的其他对象集进行比较-它们在历史上可能更早或更晚,并且具有相同的文件名或不同的文件名,并且从理论上讲,Git甚至可以针对Blob对象压缩提交对象,反之亦然(尽管实际上不是),并尝试找到某种方式来使用更少的磁盘空间来表示许多Blob.但是,至少在逻辑上,结果仍然是一系列独立的对象,使用它们的哈希ID可以完整完整地恢复其原始形式.因此,即使此时已使用的磁盘空间减少了(我们希望!),所有对象仍与以前完全相同.

那么当 Git比较文件时?答案是:仅当您要求时.询问时间"是您直接运行git diff的时间:

git diff commit1 commit2

或间接:

git show commit  # roughly, `git diff commit^@ commmit`
git log -p       # runs `git show commit`, more or less, on each commit

对此有很多微妙之处,特别是git show在合并提交上运行时会产生Git称为 combined diffs 的内容,而git log -p通常只是跳过合并提交-但是这些以及其他一些重要情况是Git运行git diff的时候.

这是当Git运行git diff 时,您可以(有时)要求它查找或不查找副本. -C标志(也拼写为--find-copies=<number>)要求Git查找副本.与普通的-C标志相比,--find-copies-harder标志(Git文档称其为计算上昂贵")看起来更难复制. -B(断开不适当的配对)选项会影响-C. -M aka --find-renames=<number>选项也影响-C.可以告诉git merge命令调整其重命名检测的级别,但是(至少当前)不能告诉它查找副本,也不能破坏不适当的配对.

(一个命令git blame进行的查找有所不同,以上内容并不完全适用于它.)


1 如果运行git commit --include <paths>git commit --only <paths>git commit <paths>git commit -a,请在运行git commit之前将其视为修改索引.在--only的特殊情况下,Git使用一个临时索引,这有点复杂,但是它仍然是从 an 索引提交的,它只是使用特殊的临时索引而不是普通的临时索引.为了建立临时索引,Git从HEAD提交中复制所有文件,然后将它们与您列出的--only文件覆盖.对于其他情况,Git只是将工作树文件复制到常规索引中,然后像往常一样继续从索引进行提交.

2 实际上,实际的快照将blob存储到存储库中,发生在git add期间.这会秘密地使git commit更快,因为您通常不会在启动git commit之前注意到运行git add所花费的额外时间.


为什么git mv存在

git mv old new的作用是非常大致:

mv old new
git add new
git add old

第一步很明显:我们需要重命名文件的工作树版本.第二步类似:我们需要将文件的索引版本放入适当的位置.但是,第三个是奇怪:为什么我们要添加"刚刚删除的文件?好吧,git add并不总是添加文件:相反,在这种情况下,它会检测到文件位于中并且不再存在.

我们也可以将第三步拼写为:

git rm --cached old

我们真正要做的就是从索引中删除旧名称.

但是这里有一个问题,这就是为什么我说"非常粗略".索引包含每个文件的副本,这些副本将在您下次运行git commit时提交. 该副本可能与工作树中的副本不匹配.实际上,如果HEAD中根本没有副本,则它甚至可能与HEAD中的副本不匹配.

例如,在之后:

echo I am a foo > foo
git add foo

文件foo存在于工作树和索引中.工作树内容和索引内容匹配.但是现在让我们更改工作树版本:

echo I am a bar > foo

现在索引和工作树不同.假设我们要将基础文件从foo移到bar,但是由于某些奇怪的原因, 3 ,我们希望保持索引内容不变.如果我们运行:

mv foo bar
git add bar

我们将在新的索引文件中获取I am a bar.如果再从索引中删除foo的旧版本,则会完全丢失I am a foo版本.

因此,git mv foo bar并不会真正移动两次,也不会移动多次.相反,它将重命名工作树文件 并重命名索引内副本.如果原始文件的索引副本与工作树文件不同,则重命名的索引副本仍然与重命名的工作树副本不同.

如果没有像git mv这样的前端命令,很难做到这一点. 4 当然,如果您打算git add进行所有操作,则不需要所有这些东西.第一名.并且,值得注意的是,如果存在git cp,则在进行索引复制时可能还应该 复制索引版本,而不是工作树版本.所以git cp确实应该存在.还应该有一个git mv --after选项,例如Mercurial的hg mv --after.这两个应该存在,但目前不存在. (不过,在我看来,对这两个方法的要求都少于对直接git mv的要求.)


3 在此示例中,它是愚蠢且毫无意义的.但是,如果您使用git add -p仔细准备用于中间提交的补丁程序,然后再决定与该补丁程序一起重命名文件,那么在不弄乱您精心修补的补丁程序的情况下绝对很方便-一起的中间版本.

4 这并非不可能:git ls-index --stage将从索引中获取所需的信息,而git update-index允许您对索引进行任意更改.您可以将这两种语言以及一些复杂的Shell脚本或更好的语言结合起来,以构建实现git mv --aftergit cp的东西.

I realize that git works by diff'ing the contents of files. I have some files that I want to copy. To absolutely prevent git from ever getting confused, is there some git command that can be used to copy the files to a different directory (not mv, but cp), and stage the files as well?

解决方案

The short answer is just "no". But there is more to know; it just requires some background. (And as JDB suggests in a comment, I'll mention why git mv exists as a convenience.)

Slightly longer: you're right that Git will diff files, but you may be wrong about when Git does these file-diffs.

Git's internal storage model proposes that each commit is an independent snapshot of all the files in that commit. The version of each file that goes into the new commit, i.e., the data in the snapshot for that path, is whatever is in the index under that path at the time you run git commit.1

The actual implementation, to the first level, is that each snapshotted-file is captured in compressed form as a blob object in the Git database. The blob object is quite independent of every previous and subsequent version of that file, except for one special case: if you make a new commit in which no data have changed, you will re-use the old blob. So when you make two commits in a row, each of which holds 100 files, and only one file is changed, the second commit re-uses 99 previous blobs, and need only snapshot one actual file into a new blob.2

Hence the fact that Git will diff files doesn't enter into making commits at all. No commit depends on a previous commit, other than to store the previous commit's hash ID (and perhaps to re-use exactly-matching blobs, but that's a side effect of them exactly matching, rather than a fancy computation at the time you run git commit).

Now, all these independent blob objects do eventually take up an exorbitant amount of space. At this point, Git can "pack" objects into a .pack file. It will compare each object to some selected set of other objects—they may be earlier or later in history, and have the same file name or different file names, and in theory Git could even compress a commit object against a blob object or vice versa (though in practice it doesn't)—and try to find some way to represent many blobs using less disk space. But the result is still, at least logically, a series of independent objects, retrieved completely intact in their original form using their hash IDs. So even though the amount of disk space used goes down (we hope!) at this point, all of the objects are exactly the same as before.

So when does Git compare files? The answer is: Only when you ask it to. The "ask time" is when you run git diff, either directly:

git diff commit1 commit2

or indirectly:

git show commit  # roughly, `git diff commit^@ commmit`
git log -p       # runs `git show commit`, more or less, on each commit

There are a bunch of subtleties about this—in particular, git show will produce what Git calls combined diffs when run on merge commits, while git log -p normally just skips right over the diffs for merge commits—but these, along with some other important cases, are when Git runs git diff.

It's when Git runs git diff that you can (sometimes) ask it to find, or not to find, copies. The -C flag, also spelled --find-copies=<number>, asks Git to find copies. The --find-copies-harder flag (which the Git documentation calls "computationally expensive") looks harder for copies than the plain -C flag. The -B (break inappropriate pairings) option affects -C. The -M aka --find-renames=<number> option also affects -C. The git merge command can be told to adjust its level of rename detection, but—at least currently—cannot be told to find copies, nor break inappropriate pairings.

(One command, git blame, does somewhat different copy-finding and the above does not entirely apply to it.)


1If you run git commit --include <paths> or git commit --only <paths> or git commit <paths> or git commit -a, think of these as modifying the index before running git commit. In the special case of --only, Git uses a temporary index, which is a little bit complicated, but it still commits from an index—it just uses the special temporary one instead of the normal one. To make the temporary index, Git copies all the files from the HEAD commit, then overlays those with the --only files you listed. For the other cases, Git just copies the work-tree files into the regular index, then goes on to make the commit from the index as usual.

2In fact, the actual snapshotting, storing the blob into the repository, happens during git add. This secretly makes git commit much faster, since you don't normally notice the extra time it takes to run git add before you fire up git commit.


Why git mv exists

What git mv old new does is, very roughly:

mv old new
git add new
git add old

The first step is obvious enough: we need to rename the work-tree version of the file. The second step is similar: we need to put the index version of the file into place. The third, though, is weird: why should we "add" a file we just removed? Well, git add doesn't always add a file: instead, in this case it detects that the file was in the index and isn't anymore.

We could also spell that third step as:

git rm --cached old

All we're really doing is taking the old name out of the index.

But there's an issue here, which is why I said "very roughly". The index has a copy of each file that will be committed the next time you run git commit. That copy might not match the one in the work-tree. In fact, it might not even match the one in HEAD, if there is one in HEAD at all.

For instance, after:

echo I am a foo > foo
git add foo

the file foo exists in the work-tree and in the index. The work-tree contents and the index contents match. But now let's change the work-tree version:

echo I am a bar > foo

Now the index and work-tree differ. Suppose we want to move the underlying file from foo to bar, but—for some strange reason3—we want to keep the index contents unchanged. If we run:

mv foo bar
git add bar

we'll get I am a bar inside the new index file. If we then remove the old version of foo from the index, we lose the I am a foo version entirely.

So, git mv foo bar doesn't really move-and-add-twice, or move-add-and-remove. Instead, it renames the work-tree file and renames the in-index copy. If the index copy of the original file differs from the work-tree file, the renamed index copy still differs from the renamed work-tree copy.

It's very difficult to do this without a front end command like git mv.4 Of course, if you plan to git add everything, you don't need all of this stuff in the first place. And, it's worth noting that if git cp existed, it probably should also copy the index version, not the work-tree version, when making the index copy. So git cp really should exist. There also should be a git mv --after option, a la Mercurial's hg mv --after. Both should exist, but currently don't. (There's less call for either of these, though, than there is for straight git mv, in my opinion.)


3For this example, it's kind of silly and pointless. But if you use git add -p to carefully prepare a patch for an intermediate commit, and then decide that along with the patch, you would like to rename the file, it's definitely handy to be able to do that without messing up your carefully-patched-together intermediate version.

4It's not impossible: git ls-index --stage will get you the information you need from the index as it is right now, and git update-index allows you to make arbitrary changes to the index. You can combine these two, and some complex shell scripting or programming in a nicer language, to build something that implements git mv --after and git cp.

这篇关于git复制文件,而不是`git mv`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆