Java - 哈希算法 - 最快的实现 [英] Java - Hash algorithms - Fastest implementations

查看:400
本文介绍了Java - 哈希算法 - 最快的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道什么是Java的最佳和最快的哈希算法,特别是MD5和SHA-2 512(SHA512)或256.我想要一个函数来获取一个字符串作为参数,并返回结果作为结果。 Thak you。

编辑:这是为了将每个URL映射到一个唯一的散列。由于MD5在这方面不够可靠,我更感兴趣的是找到最好的& SHA-2算法的最快实现。请注意,我知道即使SHA-2可能会为某些URL生成相同的哈希值,但我可以忍受这一点。 第一件事第一:速度被高估。在宣布给定的算法太慢之前,您应该采取措施。大多数情况下,散列函数的速度无论如何都没有明显的差别。如果你对安全性有所疑虑,那么首先选择一个足够安全的散列函数,然后只考虑性能。

另外,你想散列字符串。 Java String 在内部是一个来自 char 值的数组的块,它表示Unicode代码点(实际上是Unicode使用UTF-16编码代码点的16位代码单元)。散列函数将一系列位或字节作为输入。因此您必须进行转换步骤,例如 str.getBytes(UTF-8),将字符串视为一串字节。与哈希本身相比,转换步骤可能具有不可忽略的成本。



注意:注意URL编码!在URL中,一些字节可以用以''符号开头的序列替换;这是为了支持不可打印的字符,但它也可用于标准字符(例如,用'替换' a ') %61 ')。这意味着两个不同的字符串(在 String.equals()的意义上)实际上可以代表同一个URL(就URL处理而言)。根据您的情况,这可能会也可能不是问题。

您应该首先尝试使用Java的 MessageDigest 使用标准(已安装)JCE提供程序(即您调用 MessageDigest.getInstance(SHA-256))的API,并对结果进行评估。理论上,JCE可以将调用映射到具有本机代码(用C语言或汇编语言编写)的实现,这将比使用Java获得的代码更快。



这就是说...



sphlib 是一个开源实现在C和Java中的许多加密散列函数。代码已经针对速度进行了优化,实际上,Java版本比Sun / Oracle提供的标准JRE更快。使用此链接,以防上一个链接失败(主服务器有时因维护而停机,好像是现在的情况)(警告:10 MB下载)。该档案还包含一份报告(已在第二个SHA-3候选人会议),在几个平台上给出了一些衡量的性能数据,对于SHA-2和即将到来的SHA-3的14个第二轮候选者,给出了一些衡量的性能数据。

但是你真的应该制定情况基准。例如,对L1缓存的影响可能会对性能产生严重影响,并且无法通过获取函数代码并单独运行来准确预测。


I want to know what is the best and fastest implementation of hash algorithms for Java especially MD5 and SHA-2 512 (SHA512) or 256. I want a function to get a string as an argument and return the hash as the result. Thak you.

Edit: This is for getting mapping each URL to a unique hash. Since MD5 is not that reliable in this area, I'm more interested in finding the best & fastest implementation for SHA-2 algorithms. Note that I know even SHA-2 might produce the same hash for some URLs but I can live with that.

解决方案

First things first: speed is overrated. You should make measures before declaring that a given algorithm is "too slow". Most of the time, hash function speed makes no noticeable difference anyway. If you have qualms about security, then first select a hash function which is secure enough, and then only worry about performance.

Moreover, you want to hash "strings". A Java String is, internally, a chunk from an array of char values which represent Unicode code points (actually, Unicode 16-bit code units which encode the code points using UTF-16). A hash function takes as input a sequence of bits or bytes. So you will have to make a conversion step, e.g. str.getBytes("UTF-8"), to obtain your string as a bunch of bytes. It is likely that the conversion step will have a non-negligible cost when compared to the hashing itself.

Note: beware of URL-encoding ! In a URL, some bytes can be replaced with sequences beginning with a '%' sign; this is meant to support non-printable characters, but it can be used on "standard" characters as well (e.g., replacing 'a' with '%61'). This means that two strings which are distinct (in the String.equals() sense) may actually represent the same URL (as far as URL processing is concerned). Depending on your situation, this may or may not be an issue.

You should first try to use Java's MessageDigest API with the standard (already installed) JCE provider (i.e. you call MessageDigest.getInstance("SHA-256")), and bench the result. Theoretically, the JCE may map the call to an implementation with "native" code (written in C or assembly), which will be faster than what you can get with Java.

That being said...

sphlib is an opensource implementation of many cryptographic hash functions, in C and in Java. The code has been optimized for speed, and, in practice, the Java version turns out to be faster than what the standard JRE from Sun/Oracle offers. Use this link in case the previous link fails (the main host server is sometimes down for maintenance, as seems to be the case right now)(warning: 10 MB download). The archive also contains a report (which was presented at the second SHA-3 candidate conference in 2010) which gives some measured performance figures on several platforms, for SHA-2 and the 14 "second round" candidates for the upcoming SHA-3.

But you really should make in-situation benchmarks. For instance, effects on L1 cache can have a drastic effect on performance, and cannot be accurately predicted by taking the function code and running it in isolation.

这篇关于Java - 哈希算法 - 最快的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆