什么是哈希表,你如何使用C? [英] What is a hash table and how do you make it in C?

查看:123
本文介绍了什么是哈希表,你如何使用C?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于数据结构的问题,称为散列表(也称为关联数组),以及如何在C中实现。



在C中创建哈希表?
什么是散列表,您如何实现?
为什么要使用哈希表而不是数组?



注意:
我知道这是一个非常广泛的问题,这将需要一个很大的答案,但我做到这一点,因为我有一些人问我是什么是的所以我把它放在这里,以完全解释它,并帮助其他人。

解决方案

先决条件 / p>

对于这个答案,我将假设你知道如何使用指针,结构体,并对C语言有一个基本的了解。



另外如果你不知道。在谈论算法和数据结构的速度时,您应该知道这些术语:



O()=(发音为Big-oh)Big-oh或O )是指最坏情况运行时。类似地,在数学中,它是大O符号,并描述了一个功能的限制行为。如果有时候O(1)是不变的时候真的很好。如果有事情O(n),这意味着如果列表是一百万长。最糟糕的是跑上百万次。 O()通常是用于确定事情运行速度的方法,因为这是最糟糕的情况下运行速度。



Ω=(希腊字母欧米茄)指这是最好的情况。它不像O()那样使用,所以我不会太多的细节。但是,只要知道,如果在某些情况下,Ω(1),在最好的情况下,只需要一次。



Θ=(希腊字母theta)是唯一的它只在O()和Ω()运行时相同时使用。就像在递归排序算法合并排序的情况一样。运行时间是Θ(n(log(n)))。这意味着它是O(n(log(n))),它是Ω(n(log(n)))。



什么是哈希表?



哈希表或关联数组是编程中使用的流行数据结构。哈希表只是一个链接列表(我将得到一个链接列表在后面)与哈希函数。散列函数基本上只是把东西放在不同的篮子里。每个篮子只是另一个链表,或者依靠你如何实现它。当我向您展示如何实现一个哈希表时,我将会详细介绍哈希表。



为什么要使用散列表而不是数组?



一个数组是非常容易使用和简单的,但它也有它的不利。对于这个例子,假设我们有一个程序,并且在这个程序中,我们希望把它的所有用户都保存在数组中。



这很简单。我们只是说我们计划在这个程序上有不超过100个用户,并用我们的用户填写该数组

  char * users [100 ]; 

//遍历每个用户,并且存储他们的名字
for(int i = 0; i< userCount; i ++)
{
users [i ] =新用户名
}

所以这些都很好,很好,真的很快。那就是O(1)。我们可以随时访问任何用户。



但是现在我们假设我们的程序非常受欢迎。现在有超过80个用户。呃哦!我们最好增加该数组的大小,否则我们会得到缓冲区溢出。



那么我们该怎么做呢?那么我们要做一个更大的新数组,并将旧数组的内容复制到新数组中。



这是非常昂贵的,我们不想这样做。我们想巧妙地思考,不要使用一个固定大小的东西。我们已经知道如何使用指向我们优势的指针,如果我们想要的话,我们可以将信息捆绑到一个结构体中。



所以我们可以创建一个结构来存储用户名,然后通过一个指针指向一个新的结构体。中提琴!我们现在有一个可扩展的数据结构。它是通过指针链接在一起的捆绑信息的列表。因此,名称链接列表。



链接列表



所以让我们创建链接列表。首先我们需要一个struct

  typedef struct node 
{
char * name;
struct node * next;
}
节点;

好吧,所以我们有一个字符串 name a ...等一下...我从来没听说过一个名为 struct node 的数据类型。为了我们的方便我 typedef 一个新的数据类型称为节点,也恰好是我们的结构称为节点。



现在我们现在有了我们的列表,现在我们需要什么?那么我们需要创建一个根到我们的列表。所以我们可以遍历它(我将在后面解释我的意思)。所以让我们分配根。 (记住早期的​​typdef节点数据类型)

  node * first = NULL; 

所以现在我们有了根,我们需要做的是做一个功能来插入新的用户名我们的列表。

  / * 
*将一个名为缓冲区的名称插入
*我们的链表
* /
void insert(char * buffer)
{
//尝试实例化节点
node * newptr = malloc(sizeof(node));
if(newptr == NULL)
{
return;
}

// make a new ponter
newptr-> name = buffer;
newptr-> next = NULL;

//检查空列表
if(first == NULL)
{
first = newptr;
}
//检查在尾部插入
else
{
//跟踪列表中以前的位置
node * predptr = first;

//因为我们不知道这个列表是多长时间
//我们必须引导一个永远的循环,直到找到结束
while(true)
{
//检查是否是列表的结尾
if(predptr-> next == NULL)
{
//将新节点添加到列表结尾
predptr-> next = newptr;

//突破永远循环
break;
}

//更新指针
predptr = predptr-> next;
}
}
}

所以你去了。我们有一个基本的链接列表,现在我们可以继续添加所有我们想要的用户,我们不必担心失去空间。但这确实伴随着双方。这个最大的问题是我们列表中的每个节点或用户都是匿名的。我们不知道他们是甚么,甚至有多少用户。 (当然有这样做可以做得更好,但我只想显示一个非常基本的链表)所以我们必须遍历整个列表来添加一个用户,因为我们无法访问结束。



这就像我们在一个巨大的沙尘暴,你看不到任何东西,我们需要到我们的谷仓。我们看不到我们的谷仓,但是我们有一个解决方案。有人站在我们那里(我们的节点),他们都拿着两根绳子(我们的指针)。每个人只拥有一根绳子,但是那根绳索在另一端被别人抓住。就像我们的结构一样,绳子作为指向它们的位置的指针。那么我们如何到达我们的谷仓? (对于这个例子,谷仓是列表中的最后一个人)。那么我们不知道我们的线路有多大或者他们去哪里。事实上,我们所看到的只是一条围栏,一根绳子绑在一起。 (我们的根!)围栏帖子永远不会改变,所以我们可以抓住这个职位,开始移动,直到我们看到我们的第一个人。那个人拿着两根绳子(这个柱子的指针和指针)。



所以我们一直沿着绳子行进,直到我们到达一个新的人,抓住他们的绳子。最终,我们到最后找到我们的谷仓!



简而言之,这是一个链表。它的好处是它可以扩展尽可能多的,但它的运行时间取决于列表的大小。运行时间是O(n)。如果这个榜单是100万大的话,那么就要跑100万次才能输入一个新的名字!哇,这似乎真的浪费只是插入1个名字。幸运的是,我们很聪明,可以创造出一个更好的解决方案。



为什么我们不是只有一个链表,而是有一些链表。一系列的链表,如果你愿意的话。为什么我们不要制作一个大小为26的数组。所以我们可以为字母表中的每个字母都有一个唯一的链表。现在,而不是n的运行时间。我们可以合理地说,我们的新的运行时间将是n / 26。现在如果你有一个100万大的名单,那根本就不会有什么区别。但是我们只是为了简化这个例子。



所以我们有一个链表的数组,但是我们如何将用户排列到数组中。那么为什么我们不做一个决定哪个用户应该去哪里的功能呢?如果您将进入数组或表,此功能将散列用户。所以我们来创建这个哈希链表。因此,名称哈希表



哈希表



正如我刚才所说,哈希表将是一个链表的数组,并将以其用户名的第一个字母进行散列。 A将进入位置0,B到1等等。



此哈希表的结构将与我们以前的链表的结构相同

  typedef struct node 
{
char * name;
struct node * next;
}
节点;

现在就像我们的链表,我们需要一个根我们的哈希表

  node * first [26] = {NULL}; 

根将是一个数组,字母的大小,其中的所有位置将被初始化为 NULL 。 (记住:链表中的最后一个元素总是指向 NULL ,否则我们不知道是结束)



让我们做一个主要的功能。这需要一个用户名,我们要哈希然后插入。

  int main(char * name)
{
//将名称散列成一个点
int hashedValue = hash(name);

//将表中的名称插入散列值
insert(hashedValue,name);
}

所以这里是我们的哈希函数。这很简单我们要做的就是看这个单词中的第一个字母,并根据什么字母0 - 25给出一个值。

  / * 
*将一个字符串并将其哈希值放入正确的桶中
* /
int hash(const char * buffer)
{
// assign a数字到缓冲区的第一个char从0-25
返回tolower(缓冲区[0]) - 'a';
}

所以现在我们需要的是创建我们的插入函数。它将会像我们的插入函数一样,除非我们引用我们的根,否则我们将引用它作为一个数组。

  / * 
*将一个字符串并将其插入散列表的一部分的链表中
* /
void insert(int key,const char * buffer)
{
//尝试实例化节点插入单词
node * newptr = malloc(sizeof(node));
if(newptr == NULL)
{
return;
}

//创建一个新的指针
strcpy(newptr-> word,buffer);
newptr-> next = NULL;

//检查空列表
if(first [key] == NULL)
{
first [key] = newptr;
}
//检查在尾部插入
else
{
node * predptr = first [key];
while(true)
{
//在尾部插入
if(predptr-> next == NULL)
{
predptr-> next = newptr;
break;
}

//更新指针
predptr = predptr-> next;
}
}
}

所以这是散列的基础表。如果你知道如何使用指针和结构体,这很简单。我知道这是一个非常简单的哈希表示例,只有一个插入函数,但是您可以使它更好,并通过散列函数获得更多的创意。您还可以使数组尽可能大,甚至使用多维数组。


I have a few questions on a data-structure called a hash table (also know as associative array) and how it is implemented in C.

How do you make a hash table in C? What is a hashtable and how do you implement it? Why would I want to use a hash table rather than an array?

NOTE: I know this is a really broad question which will require a large answer but, I made this because I had some people asking me what it all was. so I put it on here to fully explain it and help anyone else out.

解决方案

Prerequisites

For this answer I'm going to assume you know how to use pointers, structs, and have a basic understanding of the C language.

Also if you don't know. When talking about the speed of algorithms and data structures you should know the terms:

O() = (it's pronounced "Big-oh") Big-oh or O() refers to the "worst-case-scenario" runtime. Similarly, in math, it's big O notation and describes the limiting behavior of a function. If somethings O(1) that's constant time "really good". If somethings O(n) that means if the list is a million long. It is at worst going to run a million time. O() is generally the one used to determine how fast something runs because that's how fast it'll run in it's worst case.

Ω = (greek letter Omega) refers to it's best case scenario. It's not used that as much as O() so I won't go into too much detail about it. But just know that if somethings Ω(1), in it's best case scenario it'll take just one time.

Θ = (greek letter theta) is unique in that it is only used when the O() and Ω() runtime are the same. So like in the case of the recursive sorting algorithm merge sort. It's run time is Θ(n(log(n))). Which means that it's O(n(log(n))) and it's Ω(n(log(n))).

What is a Hash table?

A hash table or associative array is a popular data structure used in programming. A hash table is just a linked list (I'll get to what a linked list is later on) with a hash function. A hash function basically just takes things and puts them in different "baskets". Each "basket" is just another linked list or something else depending on how you implement it. I'll explain more details on hash tables when I show you how to implement one.

Why would I want to use a hash table rather than an array?

An array is very easy to use and simple to make, but it also has it's down sides. For this example, let's say we have a program and in this program we want to keep all it's user in an array.

That's pretty simple. Let's just say we plan on this program having no more than 100 users and fill that array with our users

char* users[100];

// iterate over every user and "store" their name
for (int i = 0; i < userCount; i++)
{
    users[i] = "New username here";
}

So that works all well and fine and really fast to. That's a O(1) right there. We can access any user in constant time.

But let's now assume that our program get's really popular. It now has over 80 users. Uh-Oh! We better increase the size of that array or else we'll get a buffer overflow.

So how do we do that? Well we're gonna have to make a new array that's bigger and copy over the contents of the old array into the new array.

That's very costly and we don't want to do that. We want to think cleverly and not use a something that has a fixed size. Well we already know how to use pointers to our advantage and we can bundle information into a struct if we wanted to.

So we could create a struct to store the username and then have it point (via a pointer) to a new struct. Viola! We now have a data structure that is expandable. It's a list of bundled information that's linked together by pointers. Thus the name linked list.

Linked Lists

So lets create that linked list. First we're gonna need a struct

typedef struct node
{
    char* name;
    struct node* next;
}
node;

Alright so we have a string name and a... Wait a sec... I've never heard of a data type called a struct node. Well for our convenience I typedef a new "data type" called a node that also happens to be our struct called node.

So now that we have our node for our list made now what do we need? Well we need a to create a "root" to our list. So we can traverse it (I'll explain what I mean by traverse later). So lets assign a root. (remember that node data type I typdef earlier)

node* first = NULL;

So now that we have our root all we need to do is make a function to insert new usernames into our list.

/*
 * inserts a name called buffer into
 * our linked list
 */
void insert(char* buffer)
{     
    // try to instantiate node for number
    node* newptr = malloc(sizeof(node));
    if (newptr == NULL)
    {
        return;
    }

    // make a new ponter
    newptr->name = buffer;
    newptr->next = NULL;

    // check for empty list
    if (first == NULL)
    {
        first = newptr;
    }
    // check for insertion at tail
    else
    {
        // keep track of the previous spot in list
        node* predptr = first;

        // because we don't know how long this list is
        // we must induce a forever loop until we find the end
        while (true)
        {
            // check if it is the end of the list
            if (predptr->next == NULL)
            {
                // add new node to end of list
                predptr->next = newptr;

                // break out of forever loop
                break;
            }

            // update pointer
            predptr = predptr->next;
        }
    }         
}

So there you go. We have a basic linked list and now we can keep adding users all we want and we don't have to worry about running out of room. But this does come with down sides. The big problem with this is that every node or "user" in our list is "anonymous". We don't know were they are at or even how many users we have with this. (of course there are ways of making this much better but I just want to show a very basic linked list) So we have to traverse the entire list to add a user because we cannot access the end.

It's like we are in a huge dust storm and you can't see anything and we need to get to our barn. We can't see where our barn is but we have a solution. There are people standing our there (our nodes) and they are all holding two ropes (our pointers). Each person only owns one rope but that rope is being held at the other end by someone else. Just like our struct, the rope acts as a pointer to where they are. So how do we get to our barn? (for this example the barn is the last "person" in the list). Well we have no idea how big our line of people are or where they go. In fact, all we see is a fence post with a rope tied to it. (Our root!) that fence post will never change so we can grab the post and start moving along until we see our first person. That person is holding two ropes (the post's pointer and their pointer).

So we keep traveling along the rope until we reach a new person and grab onto their rope. Eventually, we get to the end and find our barn!

So that is a linked list in a nutshell. It's benefits are that it can expand as much as you want but it's run time depends on how big the list is. It's run time is O(n). Where if the list was 1 million big, it would have to run 1 million times to run to insert a new name! Wow that seems like really wasteful just to insert 1 name.

Luckily, we are clever and can create a better solution. Why don't we, instead of having just one linked list, have a few linked lists. An array of linked lists if you will. Why don't we make an array of size 26. So we can have a unique linked list for every letter of the alphabet. Now instead of a run time of n. We can reasonably say that our new run time will be n/26. Now that won't make much of a difference at all if you have a list 1 million big. But we're just going to keep it simple for this example.

So we have an array of linked lists but how are we going to sort our users into the array. Well... why don't we make a function that decides which user should go where. This function will "hash" the users if you will into an array or "table". So let's create this "hashed" linked list. Thus the name hash table

Hash Table

As I just said, our hash table will be an array of linked lists and will be hashed by the first letter of their username. A will go to position 0, B to 1, and so on.

The struct for the this hash table will be the same as the struct for our previous linked list

typedef struct node
{
    char* name;
    struct node* next;
}
node;

Now just like our linked list, we need a root for our hash table

node* first[26] = {NULL};

The root will be an array the size of the alphabet and all positions in it will be initialized to NULL. (Remember: the last element in a linked list always has to point to NULL or else we wouldn't know it was the end)

Lets make a main function. That takes a username we are going to hash then insert.

int main(char* name)
{
    // hash the name into a spot
    int hashedValue = hash(name);

    // insert the name in table with hashed value
    insert(hashedValue, name);
}

So here's our hash function. It's pretty simple. All we want to do is look at the first letter in the word and give a value from 0 - 25 based on what letter it is

/*
 * takes a string and hashes it into the correct bucket
 */
int hash(const char* buffer)
{
    // assign a number to the first char of buffer from 0-25
    return tolower(buffer[0]) - 'a';
}

So now all we need is to create our insert function. It's going to look just like our insert function before except every time we reference our root, we're going to reference it as an array.

/*
 * takes a string and inserts it into a linked list at a part of the hash table
 */
void insert(int key, const char* buffer)
{
    // try to instantiate node to insert word
    node* newptr = malloc(sizeof(node));
    if (newptr == NULL)
    {
        return;
    }

    // make a new pointer
    strcpy(newptr->word, buffer);
    newptr->next = NULL;

    // check for empty list
    if (first[key] == NULL)
    {
       first[key] = newptr;
    }
    // check for insertion at tail
    else
    {
        node* predptr = first[key];
        while (true)
        {
            // insert at tail
            if (predptr->next == NULL)
            {
                predptr->next = newptr;
                break;
            }

            // update pointer
            predptr = predptr->next;
        }
    }
}

So that's basics of a hash table. It's pretty simple if you know how to use pointers and structs. I know that was a pretty simple example of a hash table with only an insert function, but you can make it a lot better and get more creative with your hashing function. You can also make the array as big as you want or even a use multi-dimensional array.

这篇关于什么是哈希表,你如何使用C?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆