为什么登台目录也被称为索引/ Git索引? [英] Why staging directory is also called Index/Git Index?

查看:168
本文介绍了为什么登台目录也被称为索引/ Git索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Git中混淆了暂存目录(Git Index)的命名。

有什么特别的含义,叫做Index?
为什么不只是称为Cache /或Temp目录,以便我们更容易理解?



对我而言,索引帮助我们更快地搜索,像DBMS中的索引,
它与临时区域有什么关系???

我做了一些谷歌搜索,但仍然没有太多想法。
ref链接 Git Index

解决方案

Git主要维护者Junio C. Hamano的文章,对于理解缓存和索引之间的区别很有启发意义:

(强调我的)


当Linus开始编写git时,他的目的是让他能够复制他在BitKeeper时代之前使用的原始tarball和补丁工作流程生成的每个中间状态。

从2.6.12 tarball开始,他将patch-1,patch-2,... 2.6.12自身,2.6.12应用了patch-1,2.6.12应用了patch-1和patch-2,成为三个版本。



但是如果你每天必须洗牌数百个补丁,所以他发明了目录缓存;作为一个概念,这大致对应于当今git中的树对象:记录的集合,每个记录都是整个目录结构包含的紧凑表示。

构建它的方式是将内容添加到缓存中,或者更新缓存中的内容。



托管此类版本控制记录集合的控制目录名为 .dircache (在一段时间后,它被重命名为 .git )。
有一个文件名为 .dircache / index ,并且该文件的内容被读取并在一组C中用一个名词 cache


然后,我们今天称之为索引的概念,一个缓冲区来构建您打算编写的内容的集合作为一个树对象,被称为缓存
每个人都可以交换地讨论缓存和索引,因为文件记录了什么是 cache 被命名为 index 。它是(现在仍然是)一个索引,允许您通过给它一个路径名称来查找缓存中的内容。



更多越来越多的人开始使用git而无需阅读代码,使用index这个词已经变得越来越普遍,原因很明显。

作为文件系统上的东西,可见的是C源代码中的变量名。

最终,我们在解释使用git作为最终用户时,停止使用cache作为名词来命名我们今天称为索引的名称。

然而,当我们想讨论git实现时讨论内部数据结构时,cache这个词仍然被用作名词(例如Let's make it程序可以同时使用多个缓存)。

在最终用户级别,cache仅用作形容词这些天; 缓存,意思是缓存在索引中的内容,而不是工作树中的内容。

我们可以称它为索引,但缓存内容是一个从早期就已经建立的短语几天来表示这个确切的概念,而我们并不需要另一个词来表示同一件事。



[...]在前几天, 将新文件添加到索引中和用新内容更新索引中已存在的文件。

[...]现代(和中世纪)版本的git使用 git add 。我们本来可以说是诚实的,并称之为更新或添加索引 add ,但git培训行业的一些人开始教学该索引作为下一次提交的暂存区域,并且作为不可避免的结果,动词进入阶段开始出现在许多文档中,意味着向索引添加内容的行为。

我有时会自己动用这个动词,但那只是当我怀疑听众可能先从这些新人那里学到git时。严格地说,这是git词汇中冗余且相当近的词。



I was confused the naming of staging directory (Git Index) in Git.

Is there any special meaning such that it is called Index? Why not just called Cache / or Temp directory so that we can understand more easily?

To me, index is sth which help us to search things faster, like indexing in DBMS, how does it relate to the staging area???

I did some google search but still have no much idea. ref link Git Index

解决方案

The article by the main Git maintainer Junio C. Hamano, is instructive, for grasping the difference between cache and index:
(emphasis mine)

When Linus started writing git, his aim was to allow him to reproduce each and every intermediate state produced by his original "tarball and patches" workflow he used before the BitKeeper days.
Starting from a 2.6.12 tarball, he queues patch-1, patch-2,... so 2.6.12 itself, 2.6.12 with patch-1 applied, 2.6.12 with both patch-1 and patch-2 applied, become three versions.

But this won't obviously scale if you have to shuffle hundreds of patches a day. So he invented "directory cache"; as a concept, this roughly corresponds to "tree" objects in today's git: a collection of records, each of which is a compact representation of what a whole directory structure contains.
The way to build it was to "add the contents to the cache, or update the contents in the cache".

The control directory to host the collection of such version control records was named ".dircache" (this was renamed to ".git" after some time).
There was a file called ".dircache/index", and the contents of this file was read and manipulated in a set of variables in C that were named after a noun, "cache".
Back then, the concept of what we today call the index, a buffer area to build up the collection of contents you intend to write out as a tree object, was called "cache".
Everybody talked about "cache" and "index" interchangeably, as the file that records what is in the "cache" was named "index". It was (and it still is) an index to allow you to find the contents in the cache by giving it a pathname.

As more and more people started using git without having to read its code at all, the use of the word "index" has become more prevalent for obvious reasons.
As something that is on the filesystem, it is much more visible than the variable name in the C source code.
Eventually, we stopped using "cache" as a noun to name what we call "the index" today when explaining the use of git as the end-user.
The word "cache" however is still used as a noun when we want to talk about the internal data structure in the context of discussing git implementation (e.g. "Let's make it possible for programs to work with more than one cache at the same time").

At the end user level, "cache" is only used as an adjective these days; "cached", meaning "contents cached in the index, not the contents in the work tree".
We could have called it "indexed", but "cached contents" was an already established phrase from very early days to mean that exact concept, and we did not need another word that meant the same thing.

[...] In the earlier days, there was a distinction between "adding a new file to the index" and "updating a file that is already in the index with new contents".
[...] Modern (and medieval) versions of git uses "git add" for both. We could have been just honest and called the act of updating-or-adding-to-the-index "add", but some people in "git training" industry started teaching the index as "the staging area for the next commit", and as an inevitable consequence, a verb "to stage" started to appear in many documentation to mean "the act of adding contents to the index".
I sometimes use this verb myself, but that is only when I suspect that the audience might have learned git first from these new people. Strictly speaking this is a redundant and fairly recent word in git vocabulary.

这篇关于为什么登台目录也被称为索引/ Git索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆