Github fork的解释以及它们如何存储文件 [英] Explanation of Github fork and how they store files

查看:186
本文介绍了Github fork的解释以及它们如何存储文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



例如,当我fork一个项目时,它会在所有github服务器上创建一个副本的代码,或者只是创建一个链接?

所以另一个问题:
在git中,因为它散列所有文件,如果你添加相同的文件它不需要再次存储文件内容,因为散列已经在系统中了,对吗?

github是这样吗?因此,如果我碰巧上传与另一个用户完全相同的代码片段,当github遇到它时,它基本上只是创建一个指向该文件的链接,因为它将具有相同的散列,或者它是否再次单独保存其所有内容? / p>

任何启蒙都会很棒,谢谢!

解决方案

github.com 与git的语义完全一样,但在其周围包含一个基于Web的GUI界面。



存储:Git将文件的每个版本存储为一个唯一的blob对象

因此,每个文件都是唯一存储的,但它使用SHA-1哈希来确定文件间的更改。



至于github,分叉本质上是一个克隆。这意味着新的分支是其服务器上的一个新的存储区域,并引用其ORIGIN。它绝不会建立两者之间的联系,因为大自然的git可以追踪遥控器。当你说如果我碰巧上传与另一个用户完全相同的一段代码时,术语上传有点含糊不清在git意义上说。如果您正在使用同一个存储库,并且git甚至允许您提交同一个文件,这意味着它是不同的,并且它在该修订中进行了检查。但是,如果你的意思是在另一个回购的克隆/分支上工作,情况会是相同的,但是在文件系统上不会有其他回购的链接。



我不能声称对github可能在内部系统上做了哪些优化。他们可能正在执行中间定制操作以节省磁盘空间。但是他们所做的任何事情对你来说都是透明的,并不重要,因为它应该总是在预期的git语义下运行。

github上的开发人员写了一篇关于他们如何在内部完成自己的git工作流程的博文。虽然它不涉及您如何管理服务的实际工作流程的问题,但我认为从结论中得出的这个引用非常有用:


Git本身的理解相当复杂,使得您使用的
工作流程更加复杂且不必要的工作流程仅仅是在每个人的日子里增加更多的
心理开销。我会一直主张使用
最简单的系统,这个系统可以为你的团队工作,并且这样做
,直到它不再工作为止,然后增加复杂性,因为绝对需要


我从中获得的是,他们是否承认git本身的复杂程度,所以很可能他们会尽可能轻松地将其包装到提供服务,并让git做它最擅长的工作。


I am just wondering what happens when a fork is done on github.

For example, when I fork a project does it make a copy on github server of all of that code, or just create a link to it?

So another question: In git since it hashes all the files if you add the same file to it it does not need to store the file contents again because the hash will be already in the system, correct?

Is github like this? So if I happen to upload the exact same piece of code as another user, when github gits it does it essentially just create a link to that file since it would have the same hash, or does it save all of its contents again separately?

Any enlightenment would be great, thanks!

解决方案

github.com is exactly the same semantics as git, but with a web-based GUI interface wrapped around it.

Storage: "Git stores each revision of a file as a unique blob object"
So each file is stored uniquely, but it uses a SHA-1 hash to determine changes from file to file.

As for github, a fork is essentially a clone. This means that a new fork is a new area of storage on their servers, with a reference to its ORIGIN. It in no way would set up links between the two, because git by nature can track remotes. Each fork knows the upstream.

When you say "if I happen to upload the exact same piece of code as another user", the term "upload" is a bit vague in the "git" sense. If you are working on the same repository and git even allows you to commit the same file, that means it was different and it checked in that revision. But if you mean working on a clone/fork of another repo, it would be the same situation, but also there would be no links made on the filesystem to the other repo.

I can't claim to have any intimate knowledge of what optimizations github might be making under the hood, on their internal system. They could possibly be doing intermediate custom operations to save on disk space. But anything they would be doing would be transparent to you and would not matter much, since effectively it should always operate under expected git semantics.

A developer at github wrote a blog post about how they internally do their own git workflow. While it doesn't relate to your question about how they manage the actual workflow of the service, I think this quote from the conclusion is pretty informative:

Git itself is fairly complex to understand, making the workflow that you use with it more complex than necessary is simply adding more mental overhead to everybody’s day. I would always advocate using the simplest possible system that will work for your team and doing so until it doesn’t work anymore and then adding complexity only as absolutely needed.

What I take away from that, is they acknowledge how complex git is by itself, so most likely they take the lightest touch possible to wrap around it to provide the service, and let git do what it does best natively.

这篇关于Github fork的解释以及它们如何存储文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆