为什么git推这么多数据? [英] Why did git push so much data?

查看:153
本文介绍了为什么git推这么多数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道git在推动变化时做了什么,以及为什么它似乎偶尔会推动更多数据而不是我所做的更改。我对两个添加了大约100行代码的文件做了一些修改 - 我想象的是少于2K的文本。



当我将这些数据推送到原点时,git将这些数据转化为超过47mb的数据:

  git push -u origin foo 
计数对象:9195,完成。
使用多达4个线程的增量压缩。
压缩对象:100%(6624/6624),完成。
写作对象:100%(9195/9195),47.08 MiB | 1.15 MiB / s,完成。
共计9195(delta 5411),重用6059(delta 2357)
remote:分析对象...(9195/9195)(50599 ms)
remote:存储packfile ... done( 5560毫秒)
远程:存储索引完成(15597毫秒)
到<< redacted>>
* [新分支] foo - > foo
分支foo设置为从原点跟踪远程分支foo。

当我比较我的更改时,(origin / master..HEAD)只有两个文件和一个提交我确实出现了。 47mb的数据来自哪里?



我看到这个:
和这个:
预测多少数据将被推入git push
,但那并没有真正告诉我发生了什么事情......为什么包/包会很大?

解决方案

我刚刚意识到存在非常现实的情况,可能导致异常大的推动。 b

推送的对象会发送什么?服务器上尚不存在。或者,而不是现有的检测。它如何检查对象的存在?在推送开始时,服务器发送已经有的引用(分支和标签)。因此,例如,如果他们有以下提交:

  CLIENT SERVER 
(foo)------ -----> aaaaa1
|
(origin / master) - > aaaaa0(主) - > aaaaa0
| |
... ...

然后客户端会得到类似于 / refs / heads / master aaaaa0 ,并且发现它只能发送commit中的新内容 aaaaa1



但是,如果有人向远程主人推送任何东西,则不同:

  CLIENT SERVER 
(foo)-----------> aaaaa1(主) - > aaaaa2
| /
(origin / master) - > aaaaa0 aaaaa0
| |
...

在这里,客户端获取 refs / heads / master aaaaa2 ,但它并不知道aaaaa2 的任何内容,因此无法推断出 aaaaa0 存在于服务器。因此,在这种仅有2个分支的简单情况下,整个历史将被发送,而不是只发送一个。



这种情况不太可能在长大后发生,项目,其中有标签和许多分支,其中一些分支变得陈旧并且没有更新。所以用户可能会发送更多,但它不会像你的情况那么大,而且不会被篡改。但是在非常小的团队中,它可能会更频繁地发生,并且差异会很大。为了避免这种情况,您可以运行 git fetch 推之前。然后,在我的例子中, aaaaa2 提交已经存在于客户端,并且 git push foo 知道它不应该发送 aaaaa0 和更早的历史记录。 $ b 阅读这里是协议中的推送实现。


I'm wondering about what git is doing when it pushes up changes, and why it seems to occasionally push way more data than the changes I've made. I made some changes to two files that added around 100 lines of code - less than 2k of text, I'd imagine.

When I went to push that data up to origin, git turned that into over 47mb of data:

git push -u origin foo
Counting objects: 9195, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6624/6624), done.
Writing objects: 100% (9195/9195), 47.08 MiB | 1.15 MiB/s, done.
Total 9195 (delta 5411), reused 6059 (delta 2357)
remote: Analyzing objects... (9195/9195) (50599 ms)
remote: Storing packfile... done (5560 ms)
remote: Storing index... done (15597 ms)
To <<redacted>>
 * [new branch]      foo -> foo
Branch foo set up to track remote branch foo from origin.

When I diff my changes, (origin/master..HEAD) only the two files and one commit I did show up. Where did the 47mb of data come from?

I saw this: When I do "git push", what do the statistics mean? (Total, delta, etc.) and this: Predict how much data will be pushed in a git push but that didn't really tell me what's going on... Why would the pack / bundle be huge?

解决方案

I just realized that there is very realistic scenario which can result in unusually big push.

What objects push does send? Which do not yet exist on server. Or, rather which it did not detect as existing. How does it check object existence? In the beginning of push, server sends references (branches and tags) which is has. So, for example, if they have following commits:

  CLIENT                                     SERVER
 (foo) -----------> aaaaa1
                      |
 (origin/master) -> aaaaa0                (master) -> aaaaa0
                      |                                 |
                     ...                               ...

Then client will get the something like /refs/heads/master aaaaa0, and find that it has to send only what is new in commit aaaaa1.

But, if somebody has pushed anything to remote master, it is different:

  CLIENT                                     SERVER
 (foo) -----------> aaaaa1                      (master) --> aaaaa2
                      |                                       /
 (origin/master) -> aaaaa0                                 aaaaa0
                      |                                      |
                     ...                                    ...

Here, client gets refs/heads/master aaaaa2, but it does not know anything about aaaaa2, so it cannot deduce that aaaaa0 exists on the server. So, in this simple case of only 2 branches the whole history will be sent instead of only incremental one.

This is unlikely to happen in grown up, being actively developed, project, which has tags and many branches some of which become stale and are not updated. So users might be sending a bit more, but it does not become that big difference as in your case, and goes unspotted. But in very small teams it can happen more often and the difference would be significant.

To avoid it, you could run git fetch before push. Then, in my example, the aaaaa2 commit would already exist at client and git push foo would know that it should not send aaaaa0 and older history.

Read here for the push implementation in protocol.

这篇关于为什么git推这么多数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆