如何评估哈希冲突概率? [英] How do I assess the hash collision probability?

查看：469 发布时间：2020/4/27 3:27:59 language-agnostic md5 probability estimation

本文介绍了如何评估哈希冲突概率?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在为搜索系统开发后端应用程序.搜索系统将文件复制到一个临时目录中，并为它们提供随机名称.然后它将临时文件的名称传递给我的应用程序.我的应用程序必须在有限的时间内处理每个文件，否则它将被关闭-这就像看门狗一样的安全措施.处理文件可能会花费很长时间，因此我需要设计能够处理这种情况的应用程序.如果我的应用程序下次关闭时，搜索系统希望为同一文件建立索引，则可能会给它一个不同的临时名称.

I'm developing a back-end application for a search system. The search system copies files to a temporary directory and gives them random names. Then it passes the temporary files' names to my application. My application must process each file within a limited period of time, otherwise it is shut down - that's a watchdog-like security measure. Processing files is likely to take long so I need to design the application capable of handling this scenario. If my application gets shut down next time the search system wants to index the same file it will likely give it a different temporary name.

显而易见的解决方案是在搜索系统和后端之间提供一个中间层.它将请求排队到后端，然后等待结果到达.如果请求在中间层超时-没问题，后端将继续工作，只有中间层会重新启动，并且稍后搜索系统重复请求时，它可以从后端检索结果.

The obvious solution is to provide an intermediate layer between the search system and the backend. It will queue the request to the backend and wait for the result to arrive. If the request times out in the intermediate layer - no problem, the backend will continue working, only the intermediate layer is restarted and it can retrieve the result from the backend when the request is later repeated by the search system.

问题是如何识别文件.他们的名字随机变化.我打算使用像MD5这样的哈希函数来哈希文件内容.我很了解生日悖论，并使用链接文章中的估算值来计算概率.如果我假设我的文件不超过100个，那么两个文件具有相同的MD5(128位)的可能性约为1,47x10 ^-29.

The problem is how to identify the files. Their names change randomly. I intend to use a hash function like MD5 to hash the file contents. I'm well aware of the birthday paradox and used an estimation from the linked article to compute the probability. If I assume I have no more than 100 000 files the probability of two files having the same MD5 (128 bit) is about 1,47x10^-29.

我应该考虑这样的冲突概率，还是只是假设相等的哈希值意味着相等的文件内容?

Should I care of such collision probability or just assume that equal hash values mean equal file contents?

如何评估哈希冲突概率? [英] How do I assess the hash collision probability?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何评估哈希冲突概率? [英] How do I assess the hash collision probability?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭