Git Smart API精简包装计算能否考虑重用常用的子树? [英] Could Git Smart API thin pack calculation ever consider reusing common sub-trees?

查看:114
本文介绍了Git Smart API精简包装计算能否考虑重用常用的子树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Q:当git推送没有共同历史的引用而不是智能协议,在构建瘦客户机时,它可以考虑本地和 origin 之间已经存在的根或子树发送?

tl; dr

考虑这种(不常见) -with并推送到远程Git仓库。




  • 我有一个本地仓库,其中本地仓库 master 指向具有1110个后代子树 a [0-9] / b [0-9] / c [0-9] 的树。 li>
  • 远程 origin / master 目前是本地 master commit,即相同的历史记录。它使用 ssh 协议。

  • 无论什么原因,我创建了一个本地分支压扁。我将该分支设置为新的单一根提交,但具有与 master 相同的内容/树。这可以通过 git commit-tree 完成。所以这个分支有一个单独的提交,没有与 master 共同的提交,但是根树 - 哈希是相同的,它指向<$ c $中的同一个树对象c> master 和 origin / master 。为了讨论这一点,这是一个单一的/压扁的提交并不重要 - 任何历史都会被重写回根提交,并且不会有通用的历史记录。

  • git push origin HEAD#push squashed



通过大型资源库的性能观察,和发送的对象数量,我怀疑 push send-pack receive-通过和相关的精简包协商




  • 确认所提交的提交压扁与任何提交 origin 目前没有共同的历史记录。

  • 忘记事实压扁指向不仅在原点中的树,而且是指当前 HEAD ref。

  • 包并发送一切。



在这种情况下,树是相同的。如果在压扁中进行后续更改...或者是额外的提交,或者是更改 a0 ,2棵树( / a0 )会发生变化,其他1109将保持不变。根树已经发生了变化,这意味着需要进行下一级搜索以查看是否值得搜索更多常见的子树。这可能需要一个启发式的方法,因为没有比较树叶下的所有子树,所以不可能推断树中任何特定深度的共同后代树的数目。



当然,如果在推送无共同历史记录中有多个提交,则需要为每个提交重复此协商。



Smart API可以考虑已经拥有的公共子树,或者至少考虑每个提交的根树,这听起来合理吗?或者应该Git已经这样做了,并且我的客户端或服务器出了问题?



git 2.8.2

解决方案

检查git的源代码并使用git守护进程和GIT_TRACE_PACKET尝试它,表示你对它做了什么是正确的:git在仅提交级别。如果历史不共享,git将不会检测到共享内容。


Smart API可以认为是否合理

如果已经存在的公共子树或者至少是根树,持有的通用子树不能通过已经持有的通用提交来识别,然后找出它们必须发送其ID的子树。

事情是,对于任何缺少完整读数的东西,我都可以构建一个似是而非的角落案例,发送任意大量的冗余数据 - 但每次发送每个现有的子树ID以避免这种可能性显然是巨大的损失。不要忘记,往返延迟是非常昂贵的。那么,在考虑在整个所有提取中增加的开销时,在什么时候你可能会花更多时间进行谈判呢?如果您要争辩说某种特定的替代方法会节省整体时间,您将不得不显示实际生产流量的硬数据。



另外记住你可以自己构建包装。这并不难,你将对象id提供给 git pack-objects pack 并将输出放入 .git / objects / pack ,恭喜,您刚刚将这些对象提取到该回购中。


Q: When git pushes refs that have no common history over the Smart Protocol, can it consider root or sub-trees already in-common between local and origin when building the thin-pack to send?

tl;dr

Consider this (uncommon) situation when working-with and pushing to a remote Git repository.

  • I have a local repository where the local master points to a tree with 1110 descendant sub-trees a[0-9]/b[0-9]/c[0-9].
  • Remote origin/master is current with the local master commit i.e. identical histories. It uses ssh protocol.
  • For whatever reason, I create a local branch squashed. I set that branch to a new, single root-commit, but with the same content/tree as master. This can be done with git commit-tree. So this branch has a single commit with no commits in-common with master, but the root tree-hash is identical, it points to the same tree object in master and origin/master. It is not important that this is a single/squashed commit in order to discuss this - any history rewritten back to the root commit, with no common history will do.
  • git push origin HEAD # push squashed

From observations of the performance of this with a large repository, and the number of objects sent, I suspect that push, send-pack and receive-pack and associated thin-pack negotiation over the Smart Protocol does something like:

  • Confirms that the commit being pushed squashed has no common-history with any commit origin currently has.
  • Is oblivious to the fact that squashed points to a tree that is not only in origin, but is the tree for a current HEAD ref.
  • Packs and sends everything.

In this case the trees are identical. If a subsequent change is made in squashed ... either an additional commit, or a new squash that changes a file in a0, 2 trees (/ and a0) would have changed, and the other 1109 would be unchanged. The root tree has changed, which means a next-level search would be required to see whether it is worth searching for further common sub-trees. This might require a heuristic, as without comparing all sub-trees down-to the leaves, it is not possible to infer the number of descendant trees in-common from the trees at any particular depth.

Of course if there are multiple commits in the nothing-in-common history being pushed, this negotiation would need to be repeated for each commit.

Does it sound reasonable that the Smart API could consider already-held common sub-trees, or at the very least, the root-tree, as it considers each commit? Or should Git already be doing this and there is something wrong with my client or server?

git version 2.8.2

解决方案

Checking git's source and trying it with git daemon and GIT_TRACE_PACKET says you're correct about what it's doing: git negotiates at the commit level only. If the history isn't shared, git won't detect the shared content.

Does it sound reasonable that the Smart API could consider already-held common sub-trees, or at the very least, the root-tree, as it considers each commit?

If the already-held common subtrees can't be identified by already-held common commits, then to identify those subtrees it'd have to send their ids.

The thing is, for anything short of a complete readout, I can construct a plausible-sounding corner case that sends an arbitrarily-large amount of redundant data -- but sending every existing subtree id every time to avoid that possibility is clearly a huge loss. Don't forget that round-trip latency is horrendously expensive. So, at what point do you become likely to be spending more time negotiating when considering added overhead across all fetches, in the aggregate? If you're going to argue that some particular alternate method would save time overall, you're going to have to show up with hard data on actual production traffic.

Also remember that you can construct packs yourself. It's not hard, you feed object id's to git pack-objects pack and drop the output into .git/objects/pack, congratulations, you've just fetched exactly those objects into that repo.

这篇关于Git Smart API精简包装计算能否考虑重用常用的子树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆