C-制作单独的链接哈希表-问题 [英] C - Making a Separate Chaining Hash Table - Issue

查看:47
本文介绍了C-制作单独的链接哈希表-问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经花了一些时间,花了很多精力来放置可理解的变量和内容.试图使它看起来干净整洁.这样我就可以轻松调试它.但是我似乎找不到我的问题...终端不输出任何东西.请帮助我确定我的错误!

I've spent some time doing this, taking effort to put understandable variables and stuff. Tried to make it look clean and tidied up. So that I can easily debug it. But I can't seem to find my issue... The terminal doesn't output anything. Please help me identify my mistake!

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct list_node *node_ptr;

struct list_node
{
    node_ptr next;
    char *key;
    char *value;
    
};

typedef node_ptr LIST;
typedef node_ptr position;

struct hash_table
{
    LIST *list_ptr_arr;
    unsigned int table_size;
};

typedef struct hash_table *HASHTABLE;

unsigned long long int
hash(const char *key, unsigned int hash_size)
{

    unsigned long long int hash;

    for(int i = 0; key[i]; i++)
    {
        hash = (hash<<32)+key[i];
    }

    return (hash%hash_size);

}

unsigned int 
next_prime(int number)
{

    int j;

    for(int i = number; ; i++)
    {
        for(j = 2; j<i; j++)
        {
            if(i%j == 0){break;}
        }

        if(i==j){return j;}
    }
}

HASHTABLE
initialize(unsigned int table_size)
{
    HASHTABLE H;

    H = (HASHTABLE) malloc(sizeof(struct hash_table));
    if(H==NULL){printf("Out of Space!"); return 0;}

    H->table_size = next_prime(table_size);

    H->list_ptr_arr = (position*) malloc(sizeof(LIST)*table_size);
    if(H->list_ptr_arr==NULL){printf("Out of Space!"); return 0;}

    H->list_ptr_arr = (LIST*) malloc(sizeof(struct list_node)*table_size);

    for(unsigned int i = 0; i<table_size; i++)
    {
        if(H->list_ptr_arr[i]==NULL){printf("Out of Space!"); return 0;}

        H->list_ptr_arr[i]=NULL;
    }


    return H;
    
}



void
insert(const char *key, const char *value, HASHTABLE H)
{
    unsigned int slot = hash(key, H->table_size);
    node_ptr entry = H->list_ptr_arr[slot];

    node_ptr prev;

    while(entry!=NULL)
    {
        if(strcmp(entry->key, key)==0)
        {
            free(entry->value);
            entry->value = malloc(strlen(value)+1);
            strncpy(entry->value,value,strlen(value));
            return;
        }

        prev = entry;
        entry = prev->next;

    }

    entry = (position) malloc(sizeof(struct list_node));
    entry->value = malloc(strlen(value)+1);
    entry->key = malloc(strlen(key)+1);
    strncpy(entry->key,key,strlen(key));
    strncpy(entry->value,value,strlen(value));
    entry->next = NULL;
    prev->next = entry;

}

void
dump(HASHTABLE H)
{

    for(unsigned int i = 0; i<H->table_size; i++)
    {
        position entry = H->list_ptr_arr[i];

        if(H->list_ptr_arr[i]==NULL){continue;}

        printf("slot[%d]: ", i);

        for(;;)
        {
            printf("%s|%s -> ", entry->key, entry->value);

            if(entry->next == NULL)
            {
                printf("NULL");
                break;
            }

            entry = entry->next;
        }

        printf("\n");

    }

}


int main()
{
  
    HASHTABLE H = initialize(10);
    insert("name1", "David", H);
    insert("name2", "Lara", H);
    insert("name3", "Slavka", H);
    insert("name4", "Ivo", H);
    insert("name5", "Radka", H);
    insert("name6", "Kvetka", H);
    dump(H);
  
    return 0;   
    
}

   

试图对其进行修改并进行一些更改,但没有帮助...

Tried to modify it and change some things up a bit but nothing helped...

先谢谢大家!

推荐答案

存在一些美观问题,并且至少有两个错误破坏了代码.我将不做任何细微的事情,它主要是风格,但是您的 initialize() insert()函数不起作用.

There are a few beauty issues and at least two errors that break the code. I won't go into minor things, it is mostly stylistic, but your initialize() and insert() functions don't work.

initialize()中,您为 H-> list_ptr_array 分配了两次内存.那没有充分的理由从第一次分配中泄漏内存,但是,当然,这不会使您的代码崩溃,只是泄漏.在第二个分配中,您分配了错误的大小,使用了 sizeof(struct list_node)* tale_size ,但是您需要一个指针数组而不是结构体(由于结构体保留了指针,因此将是更大).同样,这只会浪费内存,而不会使其崩溃.尽管如此,如果拥有合适的内存,您会更好一些,可以使用

In initialize() you allocate memory for H->list_ptr_array twice. That leaks the memory from the first allocation for no good reason, but of course, that won't crash your code, just leak. In the second allocation, you allocate the wrong size, you use sizeof(struct list_node) * tale_size, but you want an array of pointers and not the structs (which, since the structs hold pointers, will be larger). That, again, only wastes memory and doesn't crash it. Still, you would be better off with the right memory, which you can get using

H->list_ptr_arr = malloc(table_size * sizeof *H->list_ptr_arr);

您不需要强制转换 malloc()的结果,它是一个 void * 且无需将其强制转换为指针类型,但是这是一个风格问题.该行的重要部分是,我们可以从分配给变量的变量中获取基础数据的大小,即使在某些时候更改了类型,它也始终可以确保获得正确的大小.我也倾向于不时使用 sizeof(type),但是 sizeof * ptr 是更好的模式,值得习惯.

You don't need to cast the result of malloc(), it is a void * and you don't need to cast that to pointer types, but that is a stylistic issue. The important part of that line is that we can get the size of the underlying data from the variable we assign to, which will always guarantee that we get the right size, even if we change the type at some point. I also tend to use sizeof(type) from time to time, but sizeof *ptr is the better pattern, and it is worth getting used to.

无论如何,尽管您分配了错误的内存量,但是您分配了足够的内存,因此您的程序不会因此而崩溃.但是,当您随后遍历表中分配的bin时,如果它们为 NULL ,则会返回错误.它们根本没有初始化,因此,如果它们是 NULL (可能是),那么纯粹是靠运气.或者,如果您认为这是错误的征兆,那就算命了.但是,如果您在这里将 NULL 视为分配错误的信号,那么为什么在得出结论并非如此之后立即将每个bin初始化为 NULL 呢?

Anyway, although you allocate the wrong amount of memory, you allocate enough, so your program doesn't crash because of it. But when you then run through the allocated bins in the table, you return with an error if they are NULL. They are not initialised at all, so if they are NULL (and they might be), then it is by pure luck. Or, if you consider it a sign of error, unfortune. But if you consider NULL a signal of allocation error here, why do you then initialise each bin to NULL right after you conclude that they aren't?

照原样,如果您恰巧在数组中获得 NULL 指针,并且由于您不检查 main()(这对于测试来说很好),这可能是程序崩溃的原因.这不是主要问题,只有在偶然的情况下,您在其中一个垃圾箱中得到了 NULL 时,它才会发生,但这是可能发生的.在垃圾箱中运行时,请勿检查 NULL .垃圾箱未初始化.只需将每个设置为 NULL .

As it is, your initialisation will abort if you happen to get a NULL pointer in the array, and since you don't check for allocation errors in main() (which is fine for a test), that might be the reason your program is crashing. It is not the main issue, and it only happens if, by chance, you get a NULL in one of your bins, but it can happen. Don't do the check for NULL when you run through the bins. The bins are not initialised. Just set each to NULL.

主要问题在 insert()中.您的 prev 变量不会在 while 循环之前进行初始化,如果您不进入循环,也不会在循环之后进行初始化.在未初始化 prev 时设置 prev-> next = entry 会带来麻烦,并且很可能会导致崩溃.特别是考虑到第一次将某些东西插入垃圾箱时, entry 将是 NULL ,因此您第一次触发该错误.取消引用未初始化的指针时会发生什么未定义的情况,但这很少意味着有什么好处.崩溃是最好的情况.

It is in insert() the main problem lies. Your prev variable is not initialised before the while-loop, and if you do not enter the loop, it won't be after it either. Setting prev->next = entry when prev is uninitialised spells trouble, and is a likely candidate for a crashing error. Especially considering that the first time you insert something into a bin, entry will be NULL, so you trigger the error the very first time. What happens when you dereference an uninitialised pointer is undefined, but it rarely means something good. A crash is the best case scenario.

我理解这里的逻辑.您想沿列表移动上一个,以便可以在末尾插入新的 entry ,并且在循环浏览列表中的条目之前没有最后一个元素.斌但这并不意味着您无法拥有指向要插入新条目的位置的初始化指针.如果使用指向指针的指针,则可以从表数组中的条目开始.那不是 list_node ,所以 list_node * 不会对 prev 起作用,而对于 list_node ** 会很好地工作.您可以执行以下操作:

I understand the logic here. You want to move prev along the list so you can insert the new entry at the end, and you don't have a last element before you loop through the entries in the bin. But that doesn't mean you can't have an initialised pointer to where you want to insert a new entry. If you use a pointer to a pointer, you can start with the entry in the table's array. That is not a list_node, so a list_node * won't do for prev, but a list_node ** will work just fine. You can do something like this:

node_ptr new_entry(const char *key, const char *value)
{
  node_ptr entry = malloc(sizeof *entry);
  if (!entry) abort(); // Add error checking
  entry->value = malloc(strlen(value) + 1);
  entry->key = malloc(strlen(key) + 1);
  strncpy(entry->key, key, strlen(key));
  strncpy(entry->value, value, strlen(value));
  entry->next = NULL;
  return entry;
}

void
insert(const char *key, const char *value, HASHTABLE H)
{
    unsigned int slot = hash(key, H->table_size);
    node_ptr entry = H->list_ptr_arr[slot];

    // Make sure that we always have a prev, by pointing it
    // to the location where we want to insert a new entry,
    // which we want at the bin if nothing else
    node_ptr *loc = &H->list_ptr_arr[slot];

    while(entry != NULL)
    {
        if(strcmp(entry->key, key)==0)
        {
            free(entry->value);
            entry->value = malloc(strlen(value)+1);
            strncpy(entry->value,value,strlen(value));
            return;
        }

        // make loc the entry's next
        loc = &entry->next;
        // and move entry forward (we don't need prev->next now)
        entry = entry->next;
    }

    // now loc will hold the address we should put
    // the entry in
    *loc = new_entry(key, value);
}

当然,由于垃圾箱中的列表未按任何特定顺序排序或保留(除非您没有提到约束,所以您无需附加新条目).您也可以在它们前面添加.然后,您无需将这样的 loc 拖到其他线性搜索中.您可以执行以下操作:

Of course, since the lists in the bins aren't sorted or kept in any particular order (unless there are constraints you haven't mentioned), you don't need to append new entries. You can prepend them as well. Then you don't need to drag such a loc along for other linear search. You could do something like:

node_ptr find_in_bin(const char *key, node_ptr bin)
{
  for (node_ptr entry = bin; entry; entry = entry->next) {
    if(strcmp(entry->key, key)==0)
      return entry;
  }
  return 0;
}

void
insert(const char *key, const char *value, HASHTABLE H)
{
    unsigned int slot = hash(key, H->table_size);
    node_ptr *bin = &H->list_ptr_arr[slot];
    node_ptr entry = find_in_bin(key, *bin);
    if (entry) {
      free(entry->value);
      entry->value = malloc(strlen(value)+1);
      strncpy(entry->value,value,strlen(value));
    } else {
      *bin = new_entry(key, value, *bin);
    }
}

如果您以这种方式解决了初始化和插入问题,我认为代码应该可以工作.我通过了几次测试就可以了,但是我可能错过了一些东西.

If you fix the initialization and insertion this way, I think the code should work. It does for the few tests I put it through, but I can have missed something.

这并不是一个错误,但是我仍然会快速对其进行评论. next_prime()函数看起来像是Eratosthenes筛子的慢版.很好,它计算素数(除非我错过了一些东西),但这不是您所需要的.如果您使用google搜索,则会找到前K个素数的表,其中包含相当大的K.您可以轻松地将它们嵌入代码中.也就是说,如果您绝对希望表具有素数大小.不过,您不需要.拥有其他尺寸的桌子没有错.

Not an error as such, but something I will still quickly comment on. The next_prime() function looks like a slow version of Eratosthenes' sieve. That is fine, it computes a prime (unless I have missed something), but it is not something you need. If you google for it, you will find tables of the first K primes, for pretty large K. You can easily embed them in your code. That is, if you absolutely want your tables to have prime sizes. You don't need to, though. There is nothing wrong with having tables of other sizes.

使用模素数进行哈希运算有一些好处,但是哈希表不必具有素数的大小就可以工作.如果您有一个大素数P和一个大小为M的哈希表,则可以执行((i%P)%M)并获得对P取模的好处以及使表大小为M的便利.这样,如果M是2的幂,则比​​较容易,那么最后的模运算可以是非常快速的位屏蔽:

There are some benefits to modulo primes for hashing, but the hash table doesn't have to have the size of the prime for this to work. If you have a large prime P, and a hash table of size M, you can do ((i % P) % M) and get the benefits of doing modulo P and the convenience of having table size M. When you resize tables and such, it is easier if M is a power of two, and then the last modulo operation can be a very fast bit-masking:

#define mask_k(n,k) (n & ((1 << k) - 1))

然后再...

   int index = mask_k(i % P, k); // where table size is 1 << k

i%P 也可能不是必需的,这取决于您的哈希函数的性能.如果您使用的哈希函数可以使您接近随机数,则 i 中的位是随机的,然后 k 最低有效位也将是,而%P 无济于事.但是,如果要对素数进行模运算,则可以对较大的素数进行模运算,然后将其掩盖为较小的表大小,因此不必使用作为素数的表大小.而且,如果您想要的表大小仍然是素数,请使用素数表.每次调整表大小时,都要计算新素数很慢.

The i % P might not be necessary either, it depends on how good your hash function is. If you have a hash function that gives you close to random numbers, then the bits in i are random, and then the k least-significant bits are as well, and % P does nothing to improve it. But if you want to do modulo a prime, you can do so for a large prime and mask it down to a smaller table size, so you don' have to use a table size that is a prime. And if you want to have a table size that is a prime anyway, use a table of primes. It is slow to have to compute new primes every time you resize the table.

这篇关于C-制作单独的链接哈希表-问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆