哈希数值最佳算法? [英] Best algorithm for hashing number values?

查看:208
本文介绍了哈希数值最佳算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在处理一系列数字时,出于安全考虑,希望使用哈希结果,从给定的数字系列生成哈希值的最佳方式是什么?输入的例子是信用卡号或银行帐号。首选输出将是一个无符号整数,以协助匹配目的。

When dealing with a series of numbers, and wanting to use hash results for security reasons, what would be the best way to generate a hash value from a given series of digits? Examples of input would be credit card numbers, or bank account numbers. Preferred output would be a single unsigned integer to assist in matching purposes.

我的感觉是,大多数字符串实现在运行时这样短的范围内似乎具有较低的熵值因为这样,碰撞率可能会比对较大样本运行的要高。

My feeling is that most of the string implementations appear to have low entropy when run against such a short range of characters and because of that, the collision rate might be higher than when run against a larger sample.

目标语言是Delphi,但是欢迎来自其他语言的答案它们可以提供一个可以导致最佳解决方案的数学基础。

The target language is Delp however answers from other languages are welcome if they can provide a mathmatical basis which can lead to an optimal solution.

此例程的目的是确定先前接收的卡/帐户是否先前已被处理。输入文件可以对多个记录的数据库有多个记录,因此性能是一个因素。

The purpose of this routine will be to determine if a previously received card/account was previously processed or not. The input file could have multiple records against a database of multiple records so performance is a factor.

推荐答案

有安全问题的所有答案从最安全最方便连续统一体。我会给你两个答案,一个是非常安全的,一个非常方便。考虑到每个人的解释,您可以为您的系统选择最佳解决方案。

With security questions all the answers lay on a continuum from most secure to most convenient. I'll give you two answers, one that is very secure, and one that is very convenient. Given that and the explanation of each you can choose the best solution for your system.

您表示您的目标是存储此值代替实际的信用卡您以后可以知道是否再次使用相同的信用卡号码。这意味着它必须只包含信用卡号码,也可以包含均匀的盐。包括CCV,到期日期,姓名等将使其无用,因为该值可能与相同的信用卡号不同。所以我们将假设您用相同的盐值填充所有信用卡号码。

You stated that your objective was to store this value in lieu of the actual credit card so you could later know if the same credit card number is used again. This means that it must contain only the credit card number and maybe a uniform salt. Inclusion of the CCV, expiration date, name, etc. would render it useless since it the value could be different with the same credit card number. So we will assume you pad all of your credit card numbers with the same salt value that will remain uniform for all entries.

方便解决方案是使用 FNV (作为Zebrabox和Nick建议)。这将产生一个32位数字,可以快速搜索搜索。当然的缺点是,它只允许最多40亿不同的数字,实际上会产生更快的碰撞。因为它具有如此高的碰撞率,所以暴力攻击可能会产生足够的无效结果,使其没有什么用。

The convenient solution is to use a FNV (As Zebrabox and Nick suggested). This will produce a 32 bit number that will index quickly for searches. The downside of course is that it only allows for at max 4 billion different numbers, and in practice will produce collisions much quicker then that. Because it has such a high collision rate a brute force attack will probably generate enough invalid results as to make it of little use.

安全解决方案是依靠SHA哈希函数(越大越好),但是需要多次迭代。我建议在10,000点的地方。是的,我知道,10,000次迭代是很多的,需要一段时间,但是当强悍的力量攻击速度是敌人时。如果你想要安全,那么你希望它是缓慢的。 SHA被设计为不具有任何大小的输入的冲突。如果发现碰撞,则哈希被认为不再可行。 AFAIK的SHA-2家族依然可行。

The secure solution is to rely on SHA hash function (the larger the better), but with multiple iterations. I would suggest somewhere on the order of 10,000. Yes I know, 10,000 iterations is a lot and it will take a while, but when it comes to strength against a brute force attack speed is the enemy. If you want to be secure then you want it to be SLOW. SHA is designed to not have collisions for any size of input. If a collision is found then the hash is considered no longer viable. AFAIK the SHA-2 family is still viable.

现在,如果你想要一个安全快捷的解决方案来搜索数据库,那么我建议使用安全解决方案(SHA- 2 x 10K),然后将完整散列存储在一列中,然后取前32位,并将其存储在不同的列中,第二列的索引。首先执行32位值的查找。如果没有比赛,那么你没有比赛。如果它产生一个匹配,那么你可以比较完整的SHA值,看看它是否相同。这意味着您正在执行完整的二进制比较(散列实际上是二进制的,但仅表示为字符串,便于人阅读和基于文本的协议传输)。

Now if you want a solution that is secure and quick to search in the DB, then I would suggest using the secure solution (SHA-2 x 10K) and then storing the full hash in one column, and then take the first 32 bits and storing it in a different column, with the index on the second column. Perform your look-up on the 32 bit value first. If that produces no matches then you have no matches. If it does produce a match then you can compare the full SHA value and see if it is the same. That means you are performing the full binary comparison (hashes are actually binary, but only represented as strings for easy human reading and for transfer in text based protocols) on a much smaller set.

如果你真的关心速度,那么你可以减少迭代次数。坦白说,即使是1000次迭代,它仍然会很快。您将要对数据库的期望值以及可能影响持续时间的其他因素(通信速度,硬件响应,负载等)做出一些现实的判断。您可能会发现,您在此过程中优化最快点,这几乎没有甚至没有实际影响。

If you are really concerned about speed then you can reduce the number of iterations. Frankly it will still be fast even with 1000 iterations. You will want to make some realistic judgment calls on how big you expect the database to get and other factors (communication speed, hardware response, load, etc.) that may effect the duration. You may find that your optimizing the fastest point in the process, which will have little to no actual impact.

另外,我建议您基准测试查找完整散列与32位子集。大多数现代数据库系统都相当快速,并且包含了一些优化,并且经常优化我们做的这个简单的方式。当我们尝试变聪明时,我们有时候会慢下来。什么是关于过早优化的报价。 。 。 ?

Also, I would recommend that you benchmark the look-up on the full hash vs. the 32 bit subset. Most modern database system are fairly fast and contain a number of optimizations and frequently optimize for us doing things the easy way. When we try to get smart we sometimes just slow it down. What is that quote about premature optimization . . . ?

这篇关于哈希数值最佳算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆