浮动的哈希函数。 [英] Hash function for float.

查看:66
本文介绍了浮动的哈希函数。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写散列函数float或double。任何建议都会被赞赏。

I wanted to write hash function float or double. Any suggestion would
be appreciated.

推荐答案




shaanxxx写于08/23/06 11:32,:


shaanxxx wrote On 08/23/06 11:32,:

我想写散列函数float或double。
I wanted to write hash function float or double.



为什么?


你有没有听过这个建议?永远不要比较漂浮 -

点数值是否相等?我承认永远不会太强大了,但是你应该很少将b b b与浮点值相比较,这仍然是正确的。问题是

不是自己的值,而是来源:两个

计算应该产生相同的结果可能在
中产生的结果在几个低位上有所不同。


因此,如果你使用的是浮点值作为一个哈希表中的
键 - 一个依赖于完全相等的概念的数据结构 - 你遇到了麻烦。计算

两个的平方根作为某个项目的关键并将其添加到表格中。稍后,计算八个

的平方根除以2(它应该为sqrt(2),对吧?)和

搜索哈希值带有那个钥匙的物品的桌子 - 这是

你找不到物品的绝佳机会。


浮点值使得可怕的哈希表键。

Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I''ll admit that "never" is too
strong, but it''s still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you''re in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there''s
an excellent chance you won''t find the item.

Floating-point values make terrible hash table keys.


任何建议都会赞赏

Any suggestion would
be appreciated.



不完美但很容易做到:


double d = sqrt(2.0);

unsigned char * p =(unsigned char *)& d;

/ *现在计算

*值p [0]到p的数组的哈希码sizeof d - 1]。

* /


这是不完美的,因为可能存在浮点值

比较相等,但有不同的位模式。许多

系统提供加零系统。和减零,和减零。看起来不同但是相等的
。有些系统支持非标准化

数字(不要与非正常数字混淆),这可以使某些值以多种形式表示(只需

因为9的平方根可以写成3.0,0.3e1,

0.03e2,...)。最后,许多系统提供NaN。价值

从不比较等于任何东西,甚至不是自己,

所以你可能有两个比特相同的值仍然是

不是'=="。


-
Er ********* @ sun.com

Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It''s imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren''t "==".

--
Er*********@sun.com




Eric Sosman写道:

Eric Sosman wrote:

shaanxxx写于08/23/06 11:32,:
shaanxxx wrote On 08/23/06 11:32,:

我想写哈希函数float或double。
I wanted to write hash function float or double.



为什么?


你有没有听过这个建议?永远不要比较漂浮 -

点数值是否相等?我承认永远不会太强大了,但是你应该很少将b b b与浮点值相比较,这仍然是正确的。问题是

不是自己的值,而是来源:两个

计算应该产生相同的结果可能在
中产生的结果在几个低位上有所不同。


因此,如果你使用的是浮点值作为一个哈希表中的
键 - 一个依赖于完全相等的概念的数据结构 - 你遇到了麻烦。计算

两个的平方根作为某个项目的关键并将其添加到表格中。稍后,计算八个

的平方根除以2(它应该为sqrt(2),对吧?)和

搜索哈希值带有那个钥匙的物品的桌子 - 这是

你找不到物品的绝佳机会。


浮点值使得可怕的哈希表键。


Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I''ll admit that "never" is too
strong, but it''s still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you''re in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there''s
an excellent chance you won''t find the item.

Floating-point values make terrible hash table keys.


任何建议都会赞赏

Any suggestion would
be appreciated.



不完美但很容易做到:


double d = sqrt(2.0);

unsigned char * p =(unsigned char *)& d;

/ *现在计算

*值p [0]到p的数组的哈希码sizeof d - 1]。

* /


这是不完美的,因为可能存在浮点值

比较相等,但有不同的位模式。许多

系统提供加零系统。和减零,和减零。看起来不同但是相等的
。有些系统支持非标准化

数字(不要与非正常数字混淆),这可以使某些值以多种形式表示(只需

因为9的平方根可以写成3.0,0.3e1,

0.03e2,...)。最后,许多系统提供NaN。价值

从不比较等于任何东西,甚至不是自己,

所以你可能有两个比特相同的值仍然是

不是'=="。


-
Er ********* @ sun.com



我根据

为浮动比较开发了以下宏limit.h


#define FLOAT_EPSILON(a,b)(ab?fabs(a)* FLT_EPSILON:fabs(b)*

FLT_EPSILON)

#define FLT_EQUAL(a,b)(fabs((a) - (b))< = FLOAT_EPSILON(a,b))


工作正常浮动比较。我想知道它会花多长时间

工作正常。

有什么建议吗?

I have developed following macro for float comparisions based on
limit.h

#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a,b))

Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?


shaanxxx schrieb :
shaanxxx schrieb:

Eric Sosman写道:
Eric Sosman wrote:

>> shaanxxx写于08/23/06 11 :32,:
>>shaanxxx wrote On 08/23/06 11:32,:

>>>我想写散列函数float或double。
>>>I wanted to write hash function float or double.


为什么?

你有没有听过这样的建议?永远不要比较浮动 -
点值是否相等?我承认永远不会太强大了,但你应该很少比较浮点值是否相等。问题不在于价值本身,而在于它们的来源:两个应该的计算。产生相同的结果可能会产生一些低阶位不同的结果。

因此,如果你使用浮点值作为键中的键。哈希表 - 一种依赖于完全相等概念的数据结构 - 你遇到了麻烦。计算两个平方根作为某个项目的关键并将其添加到表格中。稍后,计算8的平方根并将其除以2(它应该为sqrt(2),对吗?)并在哈希表中搜索具有该键的项目 - - 有一个很好的机会你找不到这个项目。

浮点值会产生糟糕的哈希表键。


Why?

Have you ever heard the advice "Never compare floating-
point values for equality?" I''ll admit that "never" is too
strong, but it''s still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.

Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you''re in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there''s
an excellent chance you won''t find the item.

Floating-point values make terrible hash table keys.


>>>任何建议都将受到赞赏。
>>>Any suggestion would
be appreciated.


不完美但很容易做到:

double d = sqrt(2.0);
unsigned char * p =(unsigned char *)& d;
/ *现在计算
*值p [0]到p [sizeof d - 1]数组的哈希码。
* /

这是不完美的,因为可能存在比较相等但具有不同位模式的浮点值。许多系统提供加零的系统。和减零,和减零。看起来不同但是相等。一些系统支持非标准化数字(不要与非正规数字混淆),这允许某些值以多种形式表示(只是作为平方根的平方根)九可以写成3.0,0.3e1,
0.03e2,...)。最后,许多系统提供NaN。价值
永远不会比较任何东西,甚至不是自己,所以你可能有两个相同的值,但仍然不是==。


Imperfect but easy to do:

double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/

It''s imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormalized"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren''t "==".



我根据

limit.h为浮动比较开发了以下宏


#定义FLOAT_EPSILON(a,b)(ab?fabs(a)* FLT_EPSILON:fabs(b)*

FLT_EPSILON)


I have developed following macro for float comparisions based on
limit.h

#define FLOAT_EPSILON(a,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)



忘记照顾好标志。尝试a = 0.0F,b = -1e10 ......

Forgets to take care of the sign. Try a=0.0F, b=-1e10...


#define FLT_EQUAL(a,b)(fabs((a) - (b))< = FLOAT_EPSILON(a,b))
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a,b))



考虑

#define FLT_APPROX_EQUAL(a,b,eps)\

(2 *晶圆厂((a) - (b))< =(eps)*(fabs(a)+ fabs(b)))

而不是

#define MY_FLT_EQUAL(a,b)\

FLT_APPROX_EQUAL(a,b,MY_EPSILON)

你明确定义MY_EPSILON。

Consider
#define FLT_APPROX_EQUAL(a,b, eps) \
(2 * fabs((a) - (b)) <= (eps)*(fabs(a)+fabs(b)))
instead, with
#define MY_FLT_EQUAL(a, b) \
FLT_APPROX_EQUAL(a,b,MY_EPSILON)
where you explicitly define MY_EPSILON.


它的工作精细浮点数比较。我想知道它会花多长时间

工作正常。

有什么建议吗?
Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?



是的:小心。 FLT_EQUAL(a,b)和FLT_EQUAL(b,c)做

并不意味着FLT_EQUAL(a,c);当你依赖订单时,这可能对你的b / b
的效率造成不利影响

,你可以在表格中输入a,b,c br />
a和c或者相同的单独条目。

如果你更了解你的浮点数是多少?b $ b试图哈希,那里可能是可靠的等价类。

干杯

Michael

-

电子邮件:我的是/ at / gmx / dot / de address。

Yes: Be careful. FLT_EQUAL(a, b) and FLT_EQUAL(b, c) do
not imply FLT_EQUAL(a, c); this can be detrimental for the
efficiency of hashing as you become dependent on the order
in which you enter a, b, c in the table as you can either have
separate entries for a and c or the same.
If you know more about how the floating point numbers you are
trying to hash, there may be reliable equivalence classes.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.


这篇关于浮动的哈希函数。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆