问题散列函数 - Ç [英] Problem with hashing function - C

查看:112
本文介绍了问题散列函数 - Ç的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了K&放大器提供了以下散列函数; R书。

I am using the following hashing function provided in the K&R book.

#define HASHSIZE 101
unsigned hash(char *s)
{
    unsigned hashval;
    for (hashval = 0; *s != '\0'; s++)
        hashval = *s + 31 * hashval;
    return hashval % HASHSIZE;
}

在我的项目,我有更多的警告开启(警告视为错误太)和上面的code将无法完成编译。

In my project, I have more warnings turned on (warnings are treated as errors too) and the above code will fail to compile.

error: conversion to ‘unsigned int’ from ‘char’ may change the sign of the result

如果我让 hashval 签约,我得到负的哈希值。我想知道这是如何固定的。

If I make the hashval signed, I am getting negative hash values. I am wondering how this can be fixed.

任何帮助吗?

推荐答案

你的编译器是捡和警告你的是,你是隐改变存储在该地区的字节的内部pretation指向取值。函数原型指定取值作为一个指针字符和你的设置默认情况下,字符取值似乎签署。然而,得到了算术正确的,你只需要使用无符号值。所以,问题是这样的:?应该怎样编译器做值通过指着取值里面居然有负值

What your compiler is picking up on and warning you about is that you are implicitly changing your interpretation of the bytes stored in the area pointed to by s. The function prototype specifies s as being a pointer to a char and by default on your setup, chars seem to be signed. However, to get the has arithmetic correct, you need to use just unsigned values. So the question is this: what should the compiler do with values pointed to through s which actually have negative values?

让我们快速分流,以确保我们明白什么样的价值观,我们可能会考虑。为符号字符 CHAR_MIN CHAR_MAX 的可能值包括的。 (这些值可以在 limits.h中找到。)为 unsigned char型的可能值是 0 UCHAR_MAX 的包容性。所以,问题就变成了这样:我们如何重新present值的可能范围从 CHAR_MIN CHAR_MAX 中范围 0 UCHAR_MAX

Let's take a quick diversion to make sure we understand what values we might be considering. The possible values for a signed char are CHAR_MIN to CHAR_MAX inclusive. (These values can be found in limits.h.) The possible values for an unsigned char are 0 to UCHAR_MAX inclusive. So the question becomes this: how do we represent the possible range of values from CHAR_MIN to CHAR_MAX within the range 0 to UCHAR_MAX?

一个简单的方法是简单地让编译器进行这种转换对你:它只是使用环绕式算法,以确保该值的范围内:它会自动添加 UCHAR_MAX + 1 足够的时间来得到一个值是 UCHAR_MAX 范围 0 之内。 然而,这样做的实际值将可能取决于您使用的编译器。这是不可移植的这种可能性,这背后隐藏你的编译器警告。

One simple approach is simply to let the compiler carry out this conversion for you: it simply uses wrap-around arithmetic to ensure that the value is within limits: it automatically adds UCHAR_MAX + 1 enough times to get a value which is within the range 0 to UCHAR_MAX. However, the actual value of this will be potentially dependent on the compiler which you are using. It is this possibility of non-portability which lies behind your compiler warning.

OK,那么,这是否让我们?好吧,如果你是ppared采取这个假想的便携性问题责任,这种做法将产生$ P $,你可以告诉你是幸福的编译器为它做使用规则中的标准转换。您可以通过使用做到这一点的的:

OK, so where does that get us? Well, if you are prepared to take responsibility for the hypothetical portability problems which this approach will produce, you can tell the compiler that you are happy for it to make the conversion using the standard rules. You do this by using a cast:

hashval = ((unsigned char) *s) + 31 * hashval;

此方法将坐席preSS警告,并确保您的算术运算全部完成为u​​nsigned,这是你想要的这种具有的功能。但是,你需要知道,在其他系统上同一code 可能的给不同的哈希结果。

This approach will suppress the warning and ensure that your arithmetic is all done as unsigned, which is what you want for this sort of has function. However, you need to be aware that the same code on other systems may give different hash results.

另一种方法是使用ANSI C标准规定,指针可以有效地转换为键入事实无符号字符* 来访问数据的基础字节结构被指出。 (我没有我的标准手抄的那一刻,还是我给你一个参考。)这将允许你这种做法推广到生产功能,让你的任何数据的值的散列类型。 (但是,要做到这一点,你必须想想你是怎么知道的数据的大小被传递。)这可能看起来像:

An alternative approach is to use the fact that the ANSI C standard specifies that pointers can validly be cast to type unsigned char * to access the underlying byte structure of the data being pointed to. (I don't have my copy of the standard to hand at the moment, or I'd give you a reference.) This would allow you to generalise this approach to producing a function which gives you a hash of a value of any data type. (However, to do this you must think about how you know the size of the data being passed in.) This might look something like:

unsigned hash(void *s, size_t n) {
  unsigned char *t = (unsigned char *) s;

  while (n--)
    hashval = (*(t++) + 31 * hashval) % HASHSIZE;

  return hashval;
}

我希望这给你一点洞察到这是怎么回事的。

I hope this gives you a bit of insight into what's going on.

这篇关于问题散列函数 - Ç的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆