Git对象SHA-1是文件内容还是文件名? [英] Git objects SHA-1 are file contents or file names?

查看:138
本文介绍了Git对象SHA-1是文件内容还是文件名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很疑惑文件的实际内容是如何存储在.git中的。



版本1 test.txt 中的实际文本内容。当我提交(第一次提交)到repo时,git会为位于 .git \objects\0c\15af113a95643d7c244332b0e0b287184cd049 中的那个文件返回一个SHA-1。



当我在文本编辑器中打开文件 15af113a95643d7c244332b0e0b287184cd049 时,它全是垃圾,像这样



x +)JMU074f040031QÐKÏ,ÉLÏË/Je¨}ºõw[ÉœÇRñ'Î}úyGª*±8#³¨,1%> 9¯ $ 5 D¢Ï3%þúgt}} *êK K K K K K K But But But But b b b b b b b b我不确定这个垃圾是代表加密形式的文本 Version 1 ,还是代表SHA-1 15af113a95643d7c244332b0e0b287184cd049

解决方案

正确答案在主题行中:


Git对象SHA-1是文件内容还是文件名?

可能既不是,因为你指的是松散目标文件的内容,而不是原始文件 - 即使你指的是原始文件,这仍然不是很好ht。



松散对象在Git中是一个普通文件。该文件的名称由对象的哈希ID构造而成。反过来,该对象的哈希ID是通过计算该对象内容的哈希值并附加前缀头部来构造的。



前缀头部依赖在对象类型上。有四种类型: blob commit tag ,和。标题由一个由类型名称组成的ASCII字符串(或等同于UTF-8)的字符串组成,后面跟着一个空格,然后是以字节为单位的对象大小的十进制表示形式通过ASCII NUL(Python中的 b'\x00'),如果您更喜欢现代Python符号,或者'\0'如果你更喜欢C)。

标题出现在实际的对象内容之后。因此,对于包含字节字符串 b'hello \ n'的文件,要散列的数据由 b'blob 6 \0hello \\\

  $ echo'hello'| git hash-object -t blob --stdin 
ce013625030ba8dba906f756967f9e9ca394464a
$ python3
[...]
>>> import hashlib
>>> s = b'blob 6 \0hello\\\
'
>>> hashlib.sha1(s).hexdigest()
'ce013625030ba8dba906f756967f9e9ca394464a'

因此,文件将用于存储此文件的名称是(派生自) ce013625030ba8dba906f756967f9e9ca394464a 。作为一个松散的对象,它变成 .git / objects / ce / 013625030ba8dba906f756967f9e9ca394464a 是一个zlib压缩格式的 b'blob 6 \0hello\\\
'
(显然, level = 1 - 默认目前是6,结果在该级别不匹配;不清楚Git的zlib是否放缩与Python的完全匹配,但是使用级别1在这里工作):

  $ echo'hello'| git hash-object -w -t blob --stdin 
ce013625030ba8dba906f756967f9e9ca394464a
$ vis .git / objects / ce / 013625030ba8dba906f756967f9e9ca394464a
x\ ^ AK\MJ\M-IOR0c\M -HH \MM \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' pre>

(请注意,最后一个 $ 是shell提示,现在又回到Python3)

 >>> import zlib 
>>> zlib.compress(s,1)
b'x\x01K\xca\xc9OR0c\xc8H\xcd\xc9\xc9\xe7\x02\x00\x1d\ xc5\x04\x14'
>>> import vis
>>> print(vis.vis(zlib.compress(s,1)))
x \ ^ AK\MJ\M-IOR0c\\M-HH\MM\MI\MI\Mg\\ \ B \ ^ @ \ ^] \ ME \ ^ D \ ^ T

其中 vis.py 是:

  def vischr(byte): 
按vis(1)的默认方式编码字符
如果字节在b'\ t \ n'中:
返回字符(字节)
#控制字符:\ ^ X; del:\ ^?
如果字节< 32或字节== 127:
return r'\ ^'+ chr(byte ^ 64)
#可打印字符,32..126
if byte< 128:
返回字符串(字节)
#元字符:前缀为\ M ^或\ M-
字节 - = 128
如果字节< 32或byte == 127:
return r'\M ^'+ chr(byte ^ 64)
return r'\M-'+ chr(byte)
$ b $
return''.join(vischr(c)for c in bytestr)
与vis(1)相同的
/ pre>

vis )产生一个可反转但可打印的二进制文件编码;这是我1993年的答案 cat -v )。



请注意,文件名存储在一个Git仓库(在一个提交下)只出现在单独的 tree 对象中的路径名组件中。计算树对象的哈希ID是非平凡的;我的Python代码可以在 githash.py下的公共脚本存储库中执行此操作。


I am confused with how a file's actual contents are stored in .git.

For e.g. Version 1 is the actual text content in test.txt. When I commit (first commit) it to the repo, git returns a SHA-1 for that file which is located in .git\objects\0c\15af113a95643d7c244332b0e0b287184cd049.

When I open the file 15af113a95643d7c244332b0e0b287184cd049 in a text editor, it's all garbage, something like this

x+)JMU074f040031QÐKÏ,ÉLÏË/Je¨}ºõw[Éœ„ÇR­ ñ·Î}úyGª*±8#³¨,1%>9?¯$5¯D¯¤¢„áôÏ3%³þú>š~}Ž÷*ë²-¶ç¡êÊòR"KâKòãs+‹sô

But I'm not sure whether this garbage represents the encrypted form of the text Version 1 or it's represented by the SHA-1 15af113a95643d7c244332b0e0b287184cd049.

解决方案

The correct answer to the question in the subject line:

Git objects SHA-1 are file contents or file names?

is probably "neither", since you were referring to the contents of the loose object file, rather than the original file—and even if you were referring to the original file, that's still not quite right.

A loose object, in Git, is a plain file. The name of the file is constructed from the object's hash ID. The object's hash ID, in turn, is constructed by computing a hash of the object's contents with a prefix header attached.

The prefixed header depends on the object type. There are four types: blob, commit, tag, and tree. The header consists of the a zero-terminated byte string composed of the type name as an ASCII (or equivalently, UTF-8) byte string, followed by a space, followed by a decimalized representation of the size of the object in bytes, followed by an ASCII NUL (b'\x00' in Python, if you prefer modern Python notation, or '\0' if you prefer C).

After the header come the actual object contents. So, for a file containing the byte string b'hello\n', the data to be hashed consist of b'blob 6\0hello\n:

$ echo 'hello' | git hash-object -t blob --stdin
ce013625030ba8dba906f756967f9e9ca394464a
$ python3
[...]
>>> import hashlib
>>> s = b'blob 6\0hello\n'
>>> hashlib.sha1(s).hexdigest()
'ce013625030ba8dba906f756967f9e9ca394464a'

Hence, the file name that would be used to store this file is (derived from) ce013625030ba8dba906f756967f9e9ca394464a. As a loose object, it becomes .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a.

The contents of that file, however, are the zlib-compressed form of b'blob 6\0hello\n' (with, apparently, level=1—the default is currently 6 and the result does not match at that level; it's not clear whether Git's zlib deflate exactly matches Python's, but using level 1 did work here):

$ echo 'hello' | git hash-object -w -t blob --stdin
ce013625030ba8dba906f756967f9e9ca394464a
$ vis .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
x\^AK\M-J\M-IOR0c\M-HH\M-M\M-I\M-I\M-g\^B\000\^]\M-E\^D\^T$

(note that the final $ is the shell prompt again; now back to Python3)

>>> import zlib
>>> zlib.compress(s, 1)
b'x\x01K\xca\xc9OR0c\xc8H\xcd\xc9\xc9\xe7\x02\x00\x1d\xc5\x04\x14'
>>> import vis
>>> print(vis.vis(zlib.compress(s, 1)))
x\^AK\M-J\M-IOR0c\M-HH\M-M\M-I\M-I\M-g\^B\^@\^]\M-E\^D\^T

where vis.py is:

def vischr(byte):
    "encode characters the way vis(1) does by default"
    if byte in b' \t\n':
        return chr(byte)
    # control chars: \^X; del: \^?
    if byte < 32 or byte == 127:
        return r'\^' + chr(byte ^ 64)
    # printable characters, 32..126
    if byte < 128:
        return chr(byte)
    # meta characters: prefix with \M^ or \M-
    byte -= 128
    if byte < 32 or byte == 127:
        return r'\M^' + chr(byte ^ 64)
    return r'\M-' + chr(byte)

def vis(bytestr):
    "same as vis(1)"
    return ''.join(vischr(c) for c in bytestr)

(vis produces an invertible but printable encoding of binary files; it was my 1993-ish answer to problems with cat -v).

Note that the names of files stored in a Git repository (under a commit) appear only as path name components stored in individual tree objects. Computing the hash ID of a tree object is nontrivial; I have Python code that does this in my public "scripts" repository under githash.py.

这篇关于Git对象SHA-1是文件内容还是文件名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆