Git如何确定存储库之间需要发送什么对象? [英] How does Git determine what objects need to be sent between repositories?

查看:134
本文介绍了Git如何确定存储库之间需要发送什么对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过这里,但无法弄清楚事情我想知道: git push git pull 如何确定对方缺少哪些提交对象?



假设我们有一个存储库,其中包含以下提交:(字母代表SHA-1 ID, d refs / heads / master

  a  - > b  - > c  - > d 

与此相反,遥控器具有以下功能:

  a  - > e  - > f  - > g 

根据git文档,远程会告诉我们它的 refs / heads / master 位于 g 中,但由于我们不知道该提交,所以实际上并未告诉我们任何内容。如何才能找出遗漏的数据?






另一方面,文件说:


此时,fetch-pack进程会查看它所拥有的对象,并通过发送want和SHA来响应它所需的对象-1它想要的。它发送它已有的所有对象与有,然后发送SHA-1。在这个列表的最后,它写入done来启动上传包过程,以开始发送它所需数据的包文件:



这解释了远程设备如何确定要发送的数据,但是这不会影响具有多个对象的存储库的性能吗?否则,文本中的实际含义是什么?






显然,数据传输的方式有很大的不同,具体取决于方向(推拉vs拉)。这个设计选择遇到了哪些挑战,以及如何理解他们在文档中的描述?

魔术在ID中。提交ID由很多东西组成,但基本上它是 SHA-1哈希这个。



  • 内容(所有内容,不仅仅是diff)

  • 作者

  • $ b $ li $日志消息
  • 家长ID


    更改其中的任何一项,您需要使用新的ID创建新的提交。请注意,包含父ID。



    这对Git意味着什么?这意味着如果我告诉你我已经提交了ABC123并且已经提交了ABC123,我们知道我们对于相同的内容,相同的作者,相同的日期,相同的消息和相同的父母具有相同的提交。这些父母拥有相同的ID,因此他们拥有相同的内容,相同的作者,相同的日期,相同的信息,和相同的父母。等等。如果ID匹配,那么它们必须具有相同的历史,因此无需再进一步检查。这是Git的一大优势,它深深地融入到了它的设计中,而你无法理解Git。



    拉是一个提取加合并。 git pull origin master git fetch origin 加上 git merge master origin / master (或 rebase 加上 - rebase )。抓取看起来像这样...

      remote @ http://example.com/project.git 

    F - G [bugfix]
    /
    A - B - C - D - E - J [master]
    \
    H - I [feature]

    本地
    原产地= http://example.com/project.git

    F - G [原产地/错误修正]
    /
    A - B - C - D - E [origin / master] [master]




    • [local]远程,你有什么分支?

    • [remote]我在G上有错误修正。在G上也有bugfix!完成。还有什么?

    • [remote]我有我的功能。

    • [local]我没有功能也没有。我是?

    • [remote]我的父母是H。

    • [local]我没有H,H的父母是什么?

    • [remote] H的父母是J。

    • [local]我没有J.J的父母是什么?

    • [remote] J的父母是E.

    • [local]我有E!请给我J,H和我。

    • [remote]好吧,他们来了。

    • [local] added J,H和我到回​​购站,并把起源/功能放在我的身上好的,你还有什么?

    • [远程]我有J的主人。 $ b
    • [local]我有E的主人,你已经寄给我J. 将原点/主点移动到J 。还有什么?
    • b

      现在本地看起来像这样...

        local 
      origin = http: //example.com/project.git

      F - G [原点/错误修正]
      /
      A - B - C - D - E [master] - J [origin / master]
      \
      H - I [origin / feature]

      然后它会执行 git merge master origin / master 来完成拉,它将快进到J。



      推送类似,除了进程反向(本地发送提交到远程),它只会快进。



      这就是 Pro Git指的是哑巴协议并在您的远程是一个简单的HTTP服务器时使用。 智能协议是什么经常使用,不那么健谈,并且有很多优化。但是你可以看到其中的任何一个都可以非常有效。没有必要传达整个历史,他们只需要发送20个字节的哈希键直到他们找到一个共同的祖先。



      这里有一些资料和进一步阅读。

      >


      I have looked here but couldn't quite figure out the things I was wondering about: how does git push or git pull figure out what commit objects are missing at the other side?

      Let's say we have a repository with the following commits: (letters stand in for SHA-1 IDs, d is refs/heads/master)

      a -> b -> c -> d
      

      The remote, in contrast, has these:

      a -> e -> f -> g
      

      According to the git document, the remote would tell us that its refs/heads/master is at g, but since we don't know that commit, that doesn't actually tell us anything. How is that enough to figure out the missing data?


      In the other direction, the document says:

      At this point, the fetch-pack process looks at what objects it has and responds with the objects that it needs by sending "want" and then the SHA-1 it wants. It sends all the objects it already has with "have" and then the SHA-1. At the end of this list, it writes "done" to initiate the upload-pack process to begin sending the packfile of the data it needs:

      this explains how the remote would determine what data to send, but wouldn't this impact pull performance on repositories with many objects? Otherwise, what is it that is actually meant in the text?


      Apparently the way of data transfer is very different depending on the direction (push vs pull). What and how are the challenges met by this design choice, and how am I to understand their descriptions in the document?

      解决方案

      The magic is in the IDs. A commit ID is made up of many things, but basically it's a SHA-1 hash of this.

      • Content (everything, not just the diff)
      • Author
      • Date
      • Log message
      • Parent IDs

      Change any of these and you need to create a new commit with a new ID. Note that the parent IDs are included.

      What does this mean for Git? It means if I tell you I have commit "ABC123" and you have commit "ABC123" we know we have the same commit with the same content, same author, same date, same message and same parents. Those parents have the same ID so they have the same content, same author, same date, same message, and same parents. And so on. If the IDs match, they must have the same history, there's no need to check further down the line. This is one of Git's great strengths, it is woven deeply into its design, and you cannot understand Git without it.

      A pull is a fetch plus a merge. git pull origin master is git fetch origin plus git merge master origin/master (or rebase with --rebase). A fetch looks something like this...

      remote @ http://example.com/project.git
      
                        F - G [bugfix]
                       /
      A - B - C - D - E - J [master]
                           \
                            H - I [feature]
      
      local
      origin = http://example.com/project.git
      
                        F - G [origin/bugfix]
                       /
      A - B - C - D - E [origin/master] [master]
      

      • [local] Hey remote, what branches do you have?
      • [remote] I have bugfix at G.
      • [local] I also have bugfix at G! Done. What else?
      • [remote] I have feature at I.
      • [local] I don't have feature nor I. What's the parents of I?
      • [remote] I's parent is H.
      • [local] I don't have H, what's H's parents?
      • [remote] H's parent is J.
      • [local] I don't have J. What's J's parents?
      • [remote] J's parent is E.
      • [local] I have E! Send me J, H and I please.
      • [remote] Ok, here they come.
      • [local] adds J, H and I to the repo and puts origin/feature on I Ok, what else do you have?
      • [remote] I have master at J.
      • [local] I have master at E, you already sent me J. moves origin/master to J. What else?
      • [remote] That's it!
      • [local] Kthxbi

      And now local looks like this...

      local
      origin = http://example.com/project.git
      
                        F - G [origin/bugfix]
                       /
      A - B - C - D - E [master] - J [origin/master]
                                    \
                                     H - I [origin/feature]
      

      Then it will do git merge master origin/master to finish the pull, which will fast forward to J.

      A push is similar, except the process goes in reverse (local sends commits to the remote) and it will only fast-forward.

      This is what Pro Git refers to as "the dumb protocol" and is used when your remote is a simple HTTP server. The Smart Protocol is what is used more often, is far less chatty, and has many optimizations. But you can see how either can be terribly efficient. There's no need to communicate the whole history, they just need to send 20 byte hash keys until they find a common ancestor.

      Here's some sources and further reading.

      这篇关于Git如何确定存储库之间需要发送什么对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆