Git正在使用新的哈希算法SHA-256,但为什么git社区选择了SHA‑256 [英] Git is moving to new hashing algorithm SHA-256 but why git community settled on SHA‑256

查看:156
本文介绍了Git正在使用新的哈希算法SHA-256,但为什么git社区选择了SHA‑256的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚从 HN帖子了解到git正在转向新的哈希算法(从SHA-1SHA-256)

I just learned from this HN-post that git is moving to new hashing algorithm ( from SHA-1 to SHA-256 )

我想知道是什么使SHA-256最适合git的用例. 是否存在/很多强大的技术原因,或者SHA-256受欢迎程度是否是一个重要因素? (我在猜测) 看 https://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions 页面,我看到了存在许多现代和较旧的替代方案.其中一些具有比SHA-256更高的性能(如果没有更多的话,几乎相同)并且更强(例如 https://crypto.stackexchange.com/q/26336 )

I wanted to know what makes SHA-256 best fit for git's use case. Is there any/many strong technical reason or is it possible that SHA-256 popularity is a strong factor ? ( I am making a guess ) Looking at https://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions page I see thee are many modern and older alternatives present. some of them are more ( almost same if not more ) performant and stronger than SHA-256 ( example https://crypto.stackexchange.com/q/26336 )

推荐答案

我在"为什么不使用Git时提出了这一建议更现代的SHA?",请参见八月. 2018

原因是Brian M在此处讨论卡尔森:

The reasons were discussed here by Brian M. Carlson:

我已经实现并测试了以下算法,所有这些算法 256位(按字母顺序):

I've implemented and tested the following algorithms, all of which are 256-bit (in alphabetical order):

  • BLAKE2b(libb2)
  • BLAKE2bp(libb2)
  • 袋鼠十二(从Keccak代码包中导入)
  • SHA-256(OpenSSL)
  • SHA-512/256(OpenSSL)
  • SHA3-256(OpenSSL)
  • SHAKE128(OpenSSL)
  • BLAKE2b (libb2)
  • BLAKE2bp (libb2)
  • KangarooTwelve (imported from the Keccak Code Package)
  • SHA-256 (OpenSSL)
  • SHA-512/256 (OpenSSL)
  • SHA3-256 (OpenSSL)
  • SHAKE128 (OpenSSL)

我也拒绝了其他一些候选人.
我找不到SHA256×16的任何参考或实现,所以我没有实现它.
我没有考虑SHAKE256,因为它几乎与SHA3-256相同 所有特征(包括性能).

I also rejected some other candidates.
I couldn't find any reference or implementation of SHA256×16, so I didn't implement it.
I didn't consider SHAKE256 because it is nearly identical to SHA3-256 in almost all characteristics (including performance).

SHA-256和SHA-512/256

SHA-256 and SHA-512/256

这些是32位和64位SHA-2算法,它们在256位中 大小.

These are the 32-bit and 64-bit SHA-2 algorithms that are 256 bits in size.

我注意到了以下好处:

  • 这两种算法都是众所周知的,并且经过大量分析.
  • 这两种算法都提供256位原像电阻.

实现可用性最高的算法是 SHA-256,SHA3-256,BLAKE2b和SHAKE128.

The algorithms with the greatest implementation availability are SHA-256, SHA3-256, BLAKE2b, and SHAKE128.

在命令行可用性方面,BLAKE2b,SHA-256,SHA-512/256, SHA3​​-256应该会在不久的将来以合理的价格提供 小型Debian,Ubuntu或Fedora安装.

In terms of command-line availability, BLAKE2b, SHA-256, SHA-512/256, and SHA3-256 should be available in the near future on a reasonably small Debian, Ubuntu, or Fedora install.

就安全性而言,最保守的选择似乎是SHA-256, SHA-512/256和SHA3-256.

As far as security, the most conservative choices appear to be SHA-256, SHA-512/256, and SHA3-256.

性能优胜者是BLAKE2b加速和SHA-256加速.

The performance winners are BLAKE2b unaccelerated and SHA-256 accelerated.

建议结论基于:

人气

在其他条件相同的情况下,我们应偏重于 用途最广建议用于新项目.

Popularity

Other things being equal we should be biased towards whatever's in the widest use & recommended for new projects.

唯一被广泛部署的硬件加速是针对SHA-1和SHA-256 来自SHA-2家族,但值得注意的是,新的SHA-3家族(2015年发布)一无所获.

The only widely deployed HW acceleration is for the SHA-1 and SHA-256 from the SHA-2 family, but notably nothing from the newer SHA-3 family (released in 2015).

类似于受欢迎程度",将事物偏向于哈希似乎更好 那已经有一段时间了,也就是说,现在选择还为时过早 SHA-3.

Similar to "popularity" it seems better to bias things towards a hash that's been out there for a while, i.e. it would be too early to pick SHA-3.

散列转换计划一旦实施,将来也可以更轻松地切换到其他版本,因此我们不急着选择一些新的散列,因为我们需要为了永远保持下去,我们总是可以在另外10到15年内进行另一次过渡.

The hash transitioning plan, once implemented, also makes it easier to switch to something else in the future, so we shouldn't be in a rush to pick some newer hash because we'll need to keep it forever, we can always do another transition in another 10-15 years.

结果:提交0ed8d8d ,Git v2.19.0-rc0,8月4日, 2018.

Result: commit 0ed8d8d, Git v2.19.0-rc0, Aug 4, 2018.

SHA-256具有许多优点:

SHA-256 has a number of advantages:

  • 它已经存在了一段时间,被广泛使用,并且几乎每个加密库(OpenSSL,mbedTLS,CryptoNG,SecureTransport等)都支持它.

  • It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc).

与SHA1DC进行比较时,即使没有加速,大多数矢量SHA-256实现的确确实更快.

When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration.

如果我们正在使用OpenPGP(或者甚至我想是CMS)进行签名,那么我们将使用SHA-2,因此让我们的安全性依赖于两个单独的对象是没有意义的当我们仅依赖其中一种算法时,它们中的任何一种都可能破坏安全性.

If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one.

就是SHA-256.

想法仍然存在:SHA1的任何概念都已从Git代码库中删除,并由通用的哈希"变量代替.
明天,该哈希将为SHA2,但该代码将来会支持其他哈希.

The idea remains: Any notion of SHA1 is being removed from Git codebase and replaced by a generic "hash" variable.
Tomorrow, that hash will be SHA2, but the code will support other hashes in the future.

莱纳斯微弱地扭曲把它放在(重点是我的):

老实说,可观察宇宙中的粒子数量约为2 ** 256.这是一个非常大的数字.

Honestly, the number of particles in the observable universe is on the order of 2**256. It's a really really big number.

不要使代码库比需要的复杂.
做出明智的技术决定,然后说"256位是很多".

Don't make the code base more complex than it needs to be.
Make a informed technical decision, and say "256 bits is a lot".

工程学与理论学之间的区别在于工程学 进行权衡.
好的软件是工程精心设计的,而不是理论上的
.

The difference between engineering and theory is that engineering makes trade-offs.
Good software is well engineered, not theorized
.

此外,我建议git默认为"abbrev-commit=40",这样 默认情况下,没有人真正看到新位.
因此,使用"[0-9a-f]{40}"作为哈希模式的perl脚本等将只是默默地继续工作.

Also, I would suggest that git default to "abbrev-commit=40", so that nobody actually sees the new bits by default.
So the perl scripts etc that use "[0-9a-f]{40}" as a hash pattern would just silently continue to work.

因为向后兼容很重要(*)

Because backwards compatibility is important (*)

(*)2 ** 160仍然是一个很大的数字,并不是真正的 实际的问题,而SHA1DC可能是下一个很好的哈希 十年或更长时间.

(*) And 2**160 is still a big big number, and hasn't really been a practical problem, and SHA1DC is likely a good hash for the next decade or longer.

(SHA1DC,用于检测(?)碰撞",是 shattered.io 实例之后,于2017年初讨论,请参见提交28dc98e ,Git v2.13.0-rc0,2017年3月,来自 git中的哈希冲突)

(SHA1DC, for "Detecting(?) Collision", was discussed in early 2017, after the collision attack shattered.io instance: see commit 28dc98e, Git v2.13.0-rc0, March 2017, from Jeff King, and "Hash collision in git")

请参见 Documentation/technical/hash-function-transition.txt

向SHA-256的过渡可以一次在一个本地存储库中完成.

The transition to SHA-256 can be done one local repository at a time.

a.不需要任何其他方采取行动.
b. SHA-256存储库可以与SHA-1 Git服务器通信(推/取).
C.用户可以互换使用对象的SHA-1和SHA-256标识符(请参阅下面的命令行上的对象名称").
d.新的签名对象使用比SHA-1更强大的哈希函数来保证其安全性.

a. Requiring no action by any other party.
b. A SHA-256 repository can communicate with SHA-1 Git servers (push/fetch).
c. Users can use SHA-1 and SHA-256 identifiers for objects interchangeably (see "Object names on the command line", below).
d. New signed objects make use of a stronger hash function than SHA-1 for their security guarantees.


Git 2.27(2020年第二季度)及其git fast-import --rewrite-submodules-from/to=<name>:<file>

请参见提交1bdca81 提交11d8ef3 提交abe0cc5 提交42d4e1d 提交e02a714 提交3c9331a 提交8b8f718 bf154a8 提交8dca7f3 提交8bd5a29 提交192b517 提交9412759 提交dadacf1 提交768e30e 提交2078991 (2020年2月22日),由 Junio C Hamano-gitster-

See commit 1bdca81, commit d9db599, commit 11d8ef3, commit abe0cc5, commit ddddf8d, commit 42d4e1d, commit e02a714, commit efa7ae3, commit 3c9331a, commit 8b8f718, commit cfe3917, commit bf154a8, commit 8dca7f3, commit 6946e52, commit 8bd5a29, commit 1f5f8f3, commit 192b517, commit 9412759, commit 61e2a70, commit dadacf1, commit 768e30e, commit 2078991 (22 Feb 2020) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster -- in commit f8cb64e, 27 Mar 2020)

fast-import :添加用于重写子模块的选项

签名人:brian m.卡尔森

fast-import: add options for rewriting submodules

Signed-off-by: brian m. carlson

将使用子模块的存储库从一种哈希算法转换为另一种哈希算法时,有必要将子模块从旧算法重写为新算法,因为只有对子模块的引用(而不是其内容)被快速写入. -导出流.
如果不重写子模块,则在其他算法中遇到子模块时,快速导入将失败,并显示"Invalid dataref"错误.

When converting a repository using submodules from one hash algorithm to another, it is necessary to rewrite the submodules from the old algorithm to the new algorithm, since only references to submodules, not their contents, are written to the fast-export stream.
Without rewriting the submodules, fast-import fails with an "Invalid dataref" error when encountering a submodule in another algorithm.

添加一对选项--rewrite-submodules-from--rewrite-submodules-to,这些选项在处理子模块时分别记录由fast-exportfast-import产生的标记的列表. 使用这些标记将子模块的提交从旧算法映射到新算法.

Add a pair of options, --rewrite-submodules-from and --rewrite-submodules-to, that take a list of marks produced by fast-export and fast-import, respectively, when processing the submodule.
Use these marks to map the submodule commits from the old algorithm to the new algorithm.

我们将标记读入两个对应的struct mark_set对象中,然后使用哈希表执行从旧到新的映射.这样一来,我们就可以重用在其他地方使用的相同的标记解析代码,并且由于标记文件无需排序,因此我们可以根据其ID有效地读取和匹配标记.

We read marks into two corresponding struct mark_set objects and then perform a mapping from the old to the new using a hash table. This lets us reuse the same mark parsing code that is used elsewhere and allows us to efficiently read and match marks based on their ID, since mark files need not be sorted.

请注意,因为我们使用khash表作为对象ID,并且此表复制struct object_id的值而不是引用它们,所以必须将我们使用的struct object_id值清零插入并在表格中查找.否则,由于未使用区域中可能残留任何堆栈垃圾,我们最终将得到不匹配的SHA-1值.

Note that because we're using a khash table for the object IDs, and this table copies values of struct object_id instead of taking references to them, it's necessary to zero the struct object_id values that we use to insert and look up in the table. Otherwise, we would end up with SHA-1 values that don't match because of whatever stack garbage might be left in the unused area.

git fast-import文档现在包括:

The git fast-import documentation now includes:

子模块重写

--rewrite-submodules-from=<name>:<file>
--rewrite-submodules-to=<name>:<file>

<name>指定的子模块的对象ID从<file>中使用的值重写为to <file>中使用的值.
导入同一子模块时,from标记应由git fast-export创建,而to标记应由git fast-import创建.

Rewrite the object IDs for the submodule specified by <name> from the values used in the from <file> to those used in the to <file>.
The from marks should have been created by git fast-export, and the to marks should have been created by git fast-import when importing that same submodule.

<name>可以是任何不包含冒号的任意字符串,但是在指定对应的标记时,两个选项都必须使用相同的值.
可以为多个子模块指定不同的值.如果没有在相应的对中使用这些选项,将是一个错误.

<name> may be any arbitrary string not containing a colon character, but the same value must be used with both options when specifying corresponding marks.
Multiple submodules may be specified with different values for . It is an error not to use these options in corresponding pairs.

在将存储库从一种哈希算法转换为另一种哈希算法时,这些选项主要有用.没有它们,如果遇到子模块,则快速导入将失败,因为它无法将对象ID写入新的哈希算法.

These options are primarily useful when converting a repository from one hash algorithm to another; without them, fast-import will fail if it encounters a submodule because it has no way of writing the object ID into the new hash algorithm.

并且:

commit :为SHA-256使用预期的签名标头

签名人:brian m.卡尔森

commit: use expected signature header for SHA-256

Signed-off-by: brian m. carlson

过渡计划预计我们将允许在一次提交中使用多种算法的签名.
为此,我们需要为每种算法使用不同的标头,以便明显地计算出哪些数据可以计算签名.

The transition plan anticipates that we will allow signatures using multiple algorithms in a single commit.
In order to do so, we need to use a different header per algorithm so that it will be obvious over which data to compute the signature.

过渡计划指定我们应该使用"gpgsig-sha256",因此连接提交代码,以便它可以编写和解析当前算法,并且可以在创建新提交时删除任何算法的标头. br> 添加测试以确保我们使用正确的标题编写并且 git fsck 不会"不要拒绝这些提交.

The transition plan specifies that we should use "gpgsig-sha256", so wire up the commit code such that it can write and parse the current algorithm, and it can remove the headers for any algorithm when creating a new commit.
Add tests to ensure that we write using the right header and that git fsck doesn't reject these commits.

这篇关于Git正在使用新的哈希算法SHA-256,但为什么git社区选择了SHA‑256的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆