"git push"和"git push"之间有什么区别?和"git exile push"? [英] What is the difference between "git push" and "git exile push"?

查看:81
本文介绍了"git push"和"git push"之间有什么区别?和"git exile push"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个git存储库,并被指示执行以下一系列操作:

  1. 将给定的文件集从文件夹复制到上述git存储库中(源文件夹"不是存储库的一部分).
  2. 执行git add .
  3. 执行git exile push folder_name/
  4. 执行git commit -m 'Commit message'

现在,我想了解我在做什么.更具体地说,前两个步骤对我来说很清楚(我在存储库中进行了一些更改,然后将此更改添加到暂存区"中,因此已准备好用于git commit).但是,最后两个步骤( 3 4 )令人困惑,我对此有以下疑问:

  1. 通常我们commit,然后是push.为什么我们在这里做不同的操作(先按first然后按commit)?
  2. 我们使用git exile push代替git push.两者有什么区别?它被推向何方?它推动什么?

我听说这与大文件有关.我们不是明确地"使用它们,而是使用它们的引用"(或指向它们的链接").但这到底是什么意思?

添加

我假设git exile push接收大文件,将其内容复制到适合保存大文件的位置,然后通过指向其副本的链接替换原始文件的内容.因此,换句话说,文件的内容将由指向其内容副本的链接替换.之后,git exile push执行git add.因此,它会更改文件,将它们添加到暂存区域,而我唯一需要做的就是git commit.

我的解释正确和完整吗?

解决方案

git exile不是Git的一部分.从 ElpieKay的链接中可以很清楚地看到在某些方面它与Git-LFS(它也不是Git的一部分)相似,并且 是您在添加的"部分中所描述的:

我假设git exile push接收大文件,将其内容复制到适合保存大文件的位置,然后通过指向其副本的链接替换原始文件的内容.因此,换句话说,文件的内容将由指向其内容副本的链接替换.

就目标而言,这是正确的,但对潜在机制而言,这是不正确的.

对于Git-LFS,目标是基于文件大小,并且Git-LFS中包含许多代码,可以实现此目的.对于Git-Exile(我没有使用过,也没有进行详尽的检查(我对代码进行了快速的了解)),目标是基于二进制性质",而不是大小,按名称模式选择要声明为二进制的文件.也就是说,您可能会说*.jpg和/或*.exe将被视为二进制.

现在让我们来详细说明.

您的工作树,提交和分支名称

您已经知道Git提交了存储文件(快照").如果您还不了解这一点,请阅读一些说明该部分如何工作的内容.为了使内容更小巧,Git以特殊的,仅Git的格式存储文件,只有Git可以处理.您需要使用非Gitty格式的文件,以便可以使用它们.因此,Git将快照的 out 文件复制到工作树中,这是您进行工作的区域.

但是现在考虑一个相当明显的事实:提交完全是只读的.您永远不能更改任何现有提交的内容.您可以随时阅读它们.您可以进行一个 new (和其他)提交,而不必考虑现有的提交.您永远无法更改一个提交.

每个提交都由一个大的,难看的,显然是随机的哈希ID标识,例如e3a80781f5932f5fea12a49eb06f3ade4ed8945c(这是Git存储库Git本身中的一个提交).这些ID基本上是人类无法使用的,因此我们选择一些重要提交,例如分支上的最新提交,并为其指定一个名称,例如master.提交哈希的名称会随着时间而变化:每次我们向存储库添加 new 提交时,Git都会为其分配一个新的唯一哈希ID.如果我们只是将新提交添加到master分支,Git将把新ID 存储 存储到名称master中,以便该名称始终标识最新的提交!

每次提交后,都会永久修复.它还存储上一个提交的哈希ID(并永久存储该ID,因为没有人可以更改该提交).因此,使用我们以名称master找到的最新提交,我们可以向后进行工作以找到更早的提交:

      <-C   <--master

我们只需按照提交C中的箭头(哈希ID)来查找较早的提交:

  <-B <-C   <--master

现在在B中也有一个箭头(实际上是父哈希ID),所以我们找到了更早的提交:

A <-B <-C   <--master

,在我们的小型示例存储库中,只有三个提交:A是有史以来的第一个提交,因此它没有父箭头/哈希ID,并且我们知道我们可以停止追逐父链接.

工作树非常简单,但它不是提交,提交也不是工作树. Git可以将提交提取到工作树中,并最终将某种保存工作树提取到新的提交中,但要这样做, Git坚持要通过它的 index ..其他版本控制系统没有索引,或者如果它们确实具有类似于索引的功能,则它们会完全隐藏而您却没有不必知道. Git的方向相反.

索引

这一切都意味着,每当您使用Git时,您都必须了解并使用Gi​​t称为 index 的内容,或者取决于谁在做Git.呼叫以及他们想强调的内容-暂存区缓存.这是一件事情的三个名字.这件事非常重要,以至于用这三个名字来结束!好吧,那个或第一个索引"是一个如此糟糕的名字... :-)但是,认真地讲,索引不断地出现在您的面前,让您了解它位于您和提交之间. /p>

为了尽可能简单,Git的索引包含将进入您提交的 next 提交的文件.这意味着索引开始时会保存当前提交中的所有文件.

运行git commit时,Git立即打包索引 中的所有内容,并从这些文件中进行 new 提交,但是它们会出现在索引中现在.索引稍后可能包含不同的内容,但是在运行git commit 时,会使用Git读取其中的内容,将其打包,然后进行 new 提交.

新的提交指向当前的提交.因此,如果我们有如上所述的简单三提交存储库:

A--B--C   <-- master (HEAD)

,并且当我们将HEAD附加到分支master时,我们进行新的提交D,以便 current 提交是提交C,新的提交将指向C,然后Git将名称master指向D:

A--B--C--D   <-- master (HEAD)

这就是分支的成长方式.

那么如何将文件放入索引?

由于此索引即暂存区域非常重要,因此您需要了解如何将文件放入.当然,它是开始,使用的是来自当前提交的文件,由git checkout提供,但是那又是什么呢?

内容部分主要是git add.正在运行:

git add README.txt

告诉Git从工作树中打包README.txt的内容,将其转换为特殊的仅Git格式,然后将其填充到索引中,并命名为README.txt.

这意味着Git中的文件流如下所示:

     --->  index  < --->          工作树

使用git checkout,将文件来自的某些提交(通常是当前提交)复制到索引中,在其中保留其特殊的仅Git格式,但现在变为可写状态.然后从索引到工作树,然后它们变成普通格式.使用git add,您可以将文件从工作树复制到索引中,覆盖之前的副本,然后将文件重新转换为特殊的仅Git格式.

最终,您运行git commit将索引打包为提交.提交将保存索引中的所有内容,该索引已转换为仅Git格式,因此这部分非常容易. Git只是确保该文件作为提交的一部分永久存在,即,将来覆盖 index 版本的git add不会覆盖或丢弃已提交的 版本.用于纯Git格式(使用垃圾回收"进行哈希处理)的基本机制使这一工作变得微不足道.

涂抹并清洁过滤器

以上所有内容中都隐藏了一个有趣的观点:Git必须将文件从索引中复制到工作树中,从而将仅Git格式扩展为普通格式.而且,Git必须将文件从工作树中复制到索引中,然后将其压缩为仅Git的格式. 如果我们在复制过程中偷偷摸摸地做了些什么?

Git在这里提供了自己的内部过滤器,例如执行CR-LF行尾而不是仅LF行尾,或者扩展$Id$以包含哈希ID.这些过滤器意味着索引中的内容和工作树中的内容实际上不再匹配.文件的索引版本不是,只是工作树文件的压缩版本.它是修改的版本,或者是替换版本.

这就是Git-LFS和Git-Exile的工作方式.他们添加了在从索引到工作树的提取"步骤中运行的过滤器,以及在从工作树到索引的压缩"中运行的过滤器.这些过滤器实际上交换的是整个文件内容.而不仅仅是交换CRLF和仅LF的结尾或扩展或压缩$Id$字符串.

git add中,大文件或二进制文件根本不会进入索引. LFS或Exile过滤器将 real 文件保存在其他地方,然后将链接放置到Git中. Git将此称为 clean 过滤器:它将icky工作树文件清理成一个不错的干净索引版本.

git checkout期间,大型或二进制过滤器不在索引中,但是LFS或Exile过滤器获取链接并从其他位置找到 real 文件,并将该文件放入您的工作树. Git将此称为 smudge 过滤器:它从索引中删除了漂亮的干净提交版本,并对其进行了污染,以制作出漂亮的工作树文件.

调用污迹和清除过滤器的机制是将文件名glob模式放入.gitattributes文件中,并带有filter=指令.在 gitattributes文档的filter部分. Git-LFS的工作原理是过滤每个文件,使用Git的长时间运行的过滤过程技巧检查文件大小,以减少开销. Git-Exile通过使用简单得多的按文件过滤方法,仅匹配有趣的文件来工作.

应在何时何处保存移动的文件?

通常,我们提交然后推送.为什么我们在这里做不同的操作(先推送然后提交)?

使用Git-LFS,索引中没有的大文件会立即发送到大文件服务器.使用Git-Exile,大文件将被填充到辅助存储库中(如果我正确阅读了代码和描述).

git exile push步骤将移动的文件推送到关联的辅助存储库.您不一定必须先执行 ,这是一个好主意,以防有人在您有机会这样做之前抓住了您的链接对象. (那个人甚至可能是你.工作树文件仍然存在,但是如果您在仅具有链接的索引条目上调用污迹过滤器,它将查找已移动的文件.)

摘要

现在我们可以看到这在思想上是正确的,但在执行方面是错误的:

我假设git exile push接收大文件,将其内容复制到适合保存大文件的位置,然后通过指向其副本的链接替换原始文件的内容.因此,换句话说,文件的内容将由指向其内容副本的链接替换.

替换实际上发生在git add时间!另一方面,仅链接版本的替换发生在git checkout期间.

I have a git repository and I am instructed to perform the following sequence of actions:

  1. Copy a given set of files from a folder to the above mentioned git repository (the "source folder" is not a part of the repository).
  2. Execute git add .
  3. Execute git exile push folder_name/
  4. Execute git commit -m 'Commit message'

Now I want to understand what I am actually doing. To be more specific, the first two steps are clear to me (I changes something in the repo and then I add this changes to the "staging area", so it is ready for git commit). However, the last two steps (3 and 4) are confusing and I have the following questions about them:

  1. Usually we commit and then push. Why do we do it differently here (first push and then commit)?
  2. Instead of git push we use git exile push. What is the difference between these two? Where does it push to? What does it push?

I heard that it has something to do with large files. Instead of using them "explicitly" we work with their "references" (or "links" to them). But what does it exactly mean?

ADDED

I assume that git exile push takes big files, copy their content to a location that is suited for holding larger files and then it replace the content of the original files by the link to their copies. So, in other words, the content of the files will be replaced by the link to the copy of their content. After that git exile push executes git add. So, it changes the files, it adds them to the staging area and the only thing that I need to do is git commit.

Is my interpretation correct and complete?

解决方案

git exile is not part of Git. It's pretty clear from ElpieKay's link that it is similar in some ways to Git-LFS (which is also not part of Git), and which is what you described in your "added" section:

I assume that git exile push takes big files, copy their content to a location that is suited for holding larger files and then it replace the content of the original files by the link to their copies. So, in other words, the content of the files will be replaced by the link to the copy of their content.

This is correct in terms of goals, but not in terms of underlying mechanism.

For Git-LFS the goal is based on file size, and Git-LFS has a lot of code in it that make this work. For Git-Exile (which I have not used, nor examined in fine detail—I did a quick eyeball of the code) the goal is based on "binary-ness" rather than size, and you must choose which files to claim are binary by name-pattern. That is, you might say *.jpg and/or *.exe are to be treated as binary.

Now let's take on the details.

Your work-tree, your commits, and your branch names

You already know that Git's commits store files ("snapshots"). If you don't already know this, go read something that describes how that part works. To keep things small-ish, Git stores the files in a special, Git-only form that only Git can deal with. You need to have the files in a non-Gitty form so that you can work with them. So Git copies the files out of the snapshot into a work-tree, which is the area where you do your work.

But now consider this rather stark fact: Commits are entirely read-only. You can never change the contents of any existing commit. You can read them out any time you like. You can make a new (and different) commit, leaving the existing commits alone. You can't change a commit, ever.

Each commit is identified by a big, ugly, apparently-random hash ID like e3a80781f5932f5fea12a49eb06f3ade4ed8945c (this is a commit in the Git repository Git itself). These IDs are basically unusable by humans, so we pick some important commit, such as the most recent commit on a branch, and give it a name like master. The name-to-commit-hash will change over time: every time we add a new commit to a repository, Git will assign it a new, unique hash ID. If we just added that new commit to the master branch, Git will store the new ID into the name master, so that the name always identifies the latest commit!

Each commit, once made, is fixed forever. It also stores the hash ID of the previous commit (and stores that forever since nothing can change the commit). So using the most recent commit, which we find by the name master, we can work backwards to find an earlier commit:

      <-C   <--master

We just follow the arrow (the hash ID) coming out of commit C to find the earlier commit:

  <-B <-C   <--master

Now there's an arrow (a parent hash ID, really) coming out of B too, so we find the earlier commit:

A <-B <-C   <--master

and in our tiny example repository, there are only three commits: A is the first one ever made, so it has no parent arrow / hash-ID, and we know we can stop chasing parent links.

The work-tree is pretty straightforward, but it's not a commit, and a commit is not the work-tree. Git can extract a commit into the work-tree, and—eventually, sort of—save a work-tree into a new commit, but to do so, Git insists on going through its index. Other version control systems don't have an index, or if they do have something that works like the index, they keep it completely hidden and you don't have to know about it. Git goes the opposite direction.

The index

This all means that whenever you work with Git, you must be aware of, and use, what Git calls the index, or—depending on who is doing the calling and what they want to emphasize—the staging area or the cache. These are three names for one single thing. That one thing is so important that it winds up with these three names! Well, that, or the first one, "index", is such a terrible name... :-) Seriously, though, the index is constantly getting in your face and making you understand that it stands between you and your commits.

To put it as simply as possible, Git's index contains the files that will go into the next commit you make. This means that the index starts out holding all the files that are in the current commit.

When you run git commit, Git packages up whatever is in the index right now, and makes a new commit from those files, however they appear in the index right now. The index might have different stuff in it later, but at the time you run git commit, Git takes what's in it, packages it up, and makes a new commit.

The new commit points back to the current commit. So if we have our simple three-commit repository as above:

A--B--C   <-- master (HEAD)

and we make a new commit D while our HEAD is attached to branch master so that the current commit is commit C, the new commit will point back to C, and Git will make the name master point to D:

A--B--C--D   <-- master (HEAD)

and that's how branches grow.

So how do you get files into the index?

Since this index-aka-staging-area is so important, you need to know how to get files into the index. Sure, it starts out with files from the current commit, courtesy of git checkout, but then what?

The what part is mostly git add. Running:

git add README.txt

tells Git to package up the contents of README.txt from your work-tree, turn it into special Git-only format, and stuff that into the index under the name README.txt.

This means that the file-flow, in Git, goes like this:

    commit  —>  index  <—>  work-tree

Using git checkout, you copy files from some commit—usually the current commit—into the index, where they keep their special Git-only format but now become write-able; and then from the index to the work-tree, where they turn into normal format. Using git add, you copy files from the work-tree into the index, overwriting the copy that was there before and turning the file back into the special Git-only format.

Eventually, you run git commit to package up the index into a commit. The commit saves whatever is in the index, which is already converted into a Git-only format, so this part is really easy. Git just makes sure that the file sticks around forever as part of the commit, i.e., that a future git add that overwrites the index version doesn't overwrite or throw out the committed version. The underlying mechanism used for Git-only format (hashing with "garbage collection") makes this trivial.

Smudge and clean filters

There's an interesting point hidden in all of the above: Git has to copy files from the index into the work-tree, expanding out the Git-only format to normal format. And, Git has to copy files from the work-tree into the index, compressing them down into Git-only format. What if we did something sneaky during the copying?

Git provides its own internal filters here, such as doing CR-LF line endings instead of LF-only line endings, or expanding $Id$ to contain a hash ID. These filters mean that what's in the index and what's in the work-tree no longer actually match up. The index version of the file isn't just a compressed version of the work-tree file. It's a modified version, or a replacement version.

This is how both Git-LFS and Git-Exile work. They add filters that operate during the "extract from index to work-tree" step, and that operate during the "compress from work-tree into index" step. These filters, rather than just swapping CRLF and LF-only endings or expanding or compressing away $Id$ strings, actually swap the entire file contents.

During git add, the large or binary file never goes into the index at all. The LFS or Exile filter saves the real file somewhere else, and puts a link into Git instead. Git calls this a clean filter: it cleans up the icky work-tree file into a nice clean index version.

During git checkout, the large or binary filter isn't in the index, but the LFS or Exile filter takes the link and finds the real file from somewhere else, and puts that file into the work-tree for you. Git calls this a smudge filter: it take the nice clean committed version out of the index and dirties it up to make the icky work-tree file.

The mechanism for invoking smudge and clean filters is that you put file name glob patterns into a .gitattributes file, and with a filter= directive. This is described in the gitattributes documentation under the filter section. Git-LFS works by filtering every file, checking the file size, using Git's long running filter process trick to reduce overhead. Git-Exile works by matching just the interesting files, using the much simpler per-file filter method.

When (and where) should the moved files be saved?

Usually we commit and then push. Why do we do it differently here (first push and then commit)?

With Git-LFS, the large files that aren't in the index are sent to the Large File Server right away. With Git-Exile, the large files are stuffed into a secondary repository (if I read the code and description correctly).

The git exile push step pushes the moved files to the associated secondary repository. You don't necessarily have to do this first, it's just a good idea in case someone grabs your linking objects before you get a chance to do it. (That someone could even be you. The work-tree files are still there, but if you invoke your smudge filter on the index entry that has only the link, it will look for the moved files.)

Summary

Now we can see how this is right in terms of idea, but wrong in terms of execution:

I assume that git exile push takes big files, copy their content to a location that is suited for holding larger files and then it replace the content of the original files by the link to their copies. So, in other words, the content of the files will be replaced by the link to the copy of their content.

The replacement actually happens at git add time! The replacement of the link-only version, in the other direction, happens during git checkout.

这篇关于"git push"和"git push"之间有什么区别?和"git exile push"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆