最好的算法散列数值? [英] Best algorithm for hashing number values?

查看:140
本文介绍了最好的算法散列数值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在用一系列数字处理,并想要使用散列结果出于安全考虑,什么是从给定的一系列数字的哈希值的最佳方式是什么?输入的例子是信用卡号,或银行帐户号码。 preferred输出将是一个无符号整数,以协助匹配的目的。

When dealing with a series of numbers, and wanting to use hash results for security reasons, what would be the best way to generate a hash value from a given series of digits? Examples of input would be credit card numbers, or bank account numbers. Preferred output would be a single unsigned integer to assist in matching purposes.

我的感觉是,大部分的字符串的实现似乎有针对如此短的字符范围经营的,正因为如此,碰撞率可能会比对一个大样本中运行时高时低的熵。

My feeling is that most of the string implementations appear to have low entropy when run against such a short range of characters and because of that, the collision rate might be higher than when run against a larger sample.

目标语言是Delphi的,但是从其他语言的答案是欢迎如果能提供一种可以导致最佳的解决方案一个工科数学基础。

The target language is Delp however answers from other languages are welcome if they can provide a mathmatical basis which can lead to an optimal solution.

这个程序的目的是要确定是否previously接收卡/账户为previously处理与否。输入文件可以具有针对多个记录的数据库的多个记录所以性能的一个因素。

The purpose of this routine will be to determine if a previously received card/account was previously processed or not. The input file could have multiple records against a database of multiple records so performance is a factor.

推荐答案

随着安全问题的所有答案躺在一个的连续的距离最安全的最方便。我给你两个答案,一个是非常安全的,而且一个是非常方便的。鉴于各的解释,你可以选择适合您的系统的最佳解决方案。

With security questions all the answers lay on a continuum from most secure to most convenient. I'll give you two answers, one that is very secure, and one that is very convenient. Given that and the explanation of each you can choose the best solution for your system.

您说,您的目标是存储此值代替实际的信用卡,所以你以后可以知道,如果相同的信用卡号码再次使用。这意味着,它必须仅包含信用卡号,也许均匀盐。在CCV,到期日期,名称等列入会使它没有用的,因为它的价值可以用相同的信用卡号码不同。因此,我们将假定你垫所有您的信用卡号码与相同的盐值,将保持统一的所有条目。

You stated that your objective was to store this value in lieu of the actual credit card so you could later know if the same credit card number is used again. This means that it must contain only the credit card number and maybe a uniform salt. Inclusion of the CCV, expiration date, name, etc. would render it useless since it the value could be different with the same credit card number. So we will assume you pad all of your credit card numbers with the same salt value that will remain uniform for all entries.

方便的解决方案是使用 FNV (如Zebrabox和尼克建议)。这将产生一个32位的数字,将指数迅速进行搜索。当然,缺点是,它仅允许在最多4个十亿不同的号码,并在实践中会产生碰撞得更快,然后说。由于它具有如此高的碰撞率强力攻击可能会生成足够无效的结果,使没有多大用处了。

The convenient solution is to use a FNV (As Zebrabox and Nick suggested). This will produce a 32 bit number that will index quickly for searches. The downside of course is that it only allows for at max 4 billion different numbers, and in practice will produce collisions much quicker then that. Because it has such a high collision rate a brute force attack will probably generate enough invalid results as to make it of little use.

安全的解决方案是依靠SHA哈希函数(越大越好),但有多次反复。我建议某处10000左右。是的,我知道,万迭代是很多,它会需要一段时间,但是当涉及到​​的强度对蛮力攻击速度是我们的敌人。如果你想成为安全的,那么你希望它是缓慢的。 SHA被设计成不具有任何碰撞尺寸输入的。如果找到了碰撞则散列被认为不再可行。 AFAIK的SHA-2家族仍是可行的。

The secure solution is to rely on SHA hash function (the larger the better), but with multiple iterations. I would suggest somewhere on the order of 10,000. Yes I know, 10,000 iterations is a lot and it will take a while, but when it comes to strength against a brute force attack speed is the enemy. If you want to be secure then you want it to be SLOW. SHA is designed to not have collisions for any size of input. If a collision is found then the hash is considered no longer viable. AFAIK the SHA-2 family is still viable.

现在,如果你想一个解决方案,那就是安全,快捷在数据库中进行搜索,那么我会建议使用安全解决方案(SHA-2×10K),然后存储完整的哈希值在一列,然后在第一个32位,将其存储在一个不同的柱,用在第二列中的指标。首先进行的32位值的查询。如果不产生任何的比赛,那么你有没有匹配。如果它确实产生匹配,那么你可以比较完整的SHA值,看看它是否是相同的。这意味着你正在执行完整的二进制比较(哈希值实际上是二进制的,但只有重新psented为便于人阅读和转移基于文本的协议串$ P $)要小得多集。

Now if you want a solution that is secure and quick to search in the DB, then I would suggest using the secure solution (SHA-2 x 10K) and then storing the full hash in one column, and then take the first 32 bits and storing it in a different column, with the index on the second column. Perform your look-up on the 32 bit value first. If that produces no matches then you have no matches. If it does produce a match then you can compare the full SHA value and see if it is the same. That means you are performing the full binary comparison (hashes are actually binary, but only represented as strings for easy human reading and for transfer in text based protocols) on a much smaller set.

如果你是真的关心速度,那么你可以减少迭代次数。坦率地讲它仍然是快速的,即使1000次迭代。你会想使你有多大希望数据库以获得一些现实的主观判断以及其它因素(通讯速度,硬件响应,负载等),可能效果的持续时间。你可能会发现你优化的最快的点的的过程中,这将有很少或几乎没有实际影响。

If you are really concerned about speed then you can reduce the number of iterations. Frankly it will still be fast even with 1000 iterations. You will want to make some realistic judgment calls on how big you expect the database to get and other factors (communication speed, hardware response, load, etc.) that may effect the duration. You may find that your optimizing the fastest point in the process, which will have little to no actual impact.

另外,我建议您基准上的完整的哈希值与32位的子集的查询。大多数现代数据库系统是相当快的,并且包含一些优化和经常优化为我们做事情的轻松的方式。当我们尝试获取智能我们有时仅仅是慢下来。什么是报价约为premature优化。 。 。 ?

Also, I would recommend that you benchmark the look-up on the full hash vs. the 32 bit subset. Most modern database system are fairly fast and contain a number of optimizations and frequently optimize for us doing things the easy way. When we try to get smart we sometimes just slow it down. What is that quote about premature optimization . . . ?

这篇关于最好的算法散列数值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆