什么是哈希表以及如何在 C 中创建它? [英] What is a hash table and how do you make it in C?

查看:20
本文介绍了什么是哈希表以及如何在 C 中创建它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些关于称为哈希表(也称为关联数组)的数据结构以及它是如何在 C 中实现的问题.

I have a few questions on a data-structure called a hash table (also know as associative array) and how it is implemented in C.

你如何在 C 中制作哈希表?什么是哈希表以及如何实现它?为什么我要使用哈希表而不是数组?

How do you make a hash table in C? What is a hashtable and how do you implement it? Why would I want to use a hash table rather than an array?

注意:我知道这是一个非常广泛的问题,需要一个很大的答案,但是,我之所以这样做是因为有人问我这一切是什么.所以我把它放在这里是为了充分解释它并帮助其他人.

NOTE: I know this is a really broad question which will require a large answer but, I made this because I had some people asking me what it all was. so I put it on here to fully explain it and help anyone else out.

推荐答案

先决条件

对于这个答案,我假设您知道如何使用指针、结构,并且对 C 语言有基本的了解.

For this answer I'm going to assume you know how to use pointers, structs, and have a basic understanding of the C language.

如果你不知道的话.在谈论算法和数据结构的速度时,您应该知道以下术语:

Also if you don't know. When talking about the speed of algorithms and data structures you should know the terms:

O() =(发音为Big-oh")Big-oh 或 O() 指的是最坏情况"运行时.类似地,在数学中,它是大 O 符号并描述了函数的极限行为.如果某事 O(1) 那是恒定时间真的很好".如果某事 O(n) 那意味着如果列表是一百万长.最坏的情况是运行一百万次.O() 通常用于确定某事物运行的速度,因为这是在最坏的情况下它的运行速度.

O() = (it's pronounced "Big-oh") Big-oh or O() refers to the "worst-case-scenario" runtime. Similarly, in math, it's big O notation and describes the limiting behavior of a function. If somethings O(1) that's constant time "really good". If somethings O(n) that means if the list is a million long. It is at worst going to run a million time. O() is generally the one used to determine how fast something runs because that's how fast it'll run in it's worst case.

Ω =(希腊字母 Omega)指的是最好的情况.它不像 O() 那样被使用,所以我不会详细介绍它.但要知道,如果是 Ω(1),在最好的情况下,它只需要一次.

Ω = (greek letter Omega) refers to it's best case scenario. It's not used that as much as O() so I won't go into too much detail about it. But just know that if somethings Ω(1), in it's best case scenario it'll take just one time.

Θ = (希腊字母 theta) 的独特之处在于它仅在 O() 和 Ω() 运行时相同时使用.就像递归排序算法合并排序的情况一样.它的运行时间是 Θ(n(log(n))).这意味着它是 O(n(log(n))) 和 Ω(n(log(n))).

Θ = (greek letter theta) is unique in that it is only used when the O() and Ω() runtime are the same. So like in the case of the recursive sorting algorithm merge sort. It's run time is Θ(n(log(n))). Which means that it's O(n(log(n))) and it's Ω(n(log(n))).

什么是哈希表?

哈希表或关联数组是编程中常用的数据结构.哈希表只是一个带有哈希函数的链表(稍后我将介绍链表是什么).散列函数基本上只是将事物放入不同的篮子"中.每个篮子"只是另一个链表或其他东西,具体取决于您如何实现它.当我向您展示如何实现哈希表时,我将解释有关哈希表的更多细节.

A hash table or associative array is a popular data structure used in programming. A hash table is just a linked list (I'll get to what a linked list is later on) with a hash function. A hash function basically just takes things and puts them in different "baskets". Each "basket" is just another linked list or something else depending on how you implement it. I'll explain more details on hash tables when I show you how to implement one.

为什么我要使用哈希表而不是数组?

数组非常易于使用且易于制作,但它也有其缺点.在这个例子中,假设我们有一个程序,我们希望将所有用户保存在一个数组中.

An array is very easy to use and simple to make, but it also has its downsides. For this example, let's say we have a program and in which we want to keep all its users in an array.

这很简单.假设我们计划让这个程序的用户不超过 100 个,并用我们的用户填充该数组

That's pretty simple. Let's just say we plan on this program having no more than 100 users and fill that array with our users

char* users[100];

// iterate over every user and "store" their name
for (int i = 0; i < userCount; i++)
{
    users[i] = "New username here";
}

所以这一切都很好,而且非常快.这就是 O(1).我们可以在恒定时间内访问任何用户.

So that works all well and fine and really fast too. That's O(1) right there. We can access any user in constant time.

但是现在让我们假设我们的程序变得非常流行.它现在有80多个用户.哦哦!我们最好增加该数组的大小,否则我们会出现缓冲区溢出.

But let's now assume that our program gets really popular. It now has over 80 users. Uh-Oh! We better increase the size of that array or else we'll get a buffer overflow.

那我们怎么做呢?好吧,我们将不得不创建一个更大的新数组,并将旧数组的内容复制到新数组中.

So how do we do that? Well we're gonna have to make a new array that's bigger and copy over the contents of the old array into the new array.

那太昂贵了,我们不想这样做.我们想聪明地思考,而不是使用具有固定大小的东西.好吧,我们已经知道如何使用指针来发挥我们的优势,如果我们愿意,我们可以将信息捆绑到 struct 中.

That's very costly and we don't want to do that. We want to think cleverly and not use a something that has a fixed size. Well we already know how to use pointers to our advantage and we can bundle information into a struct if we wanted to.

所以我们可以创建一个 struct 来存储用户名,然后让它(通过指针)指向一个新的 struct.!我们现在有一个可扩展的数据结构.它是通过指针链接在一起的捆绑信息列表.因此名称链表.

So we could create a struct to store the username and then have it point (via a pointer) to a new struct. Voila! We now have a data structure that is expandable. It's a list of bundled information that's linked together by pointers. Thus the name linked list.

链接列表

那么让我们创建那个链表.首先我们需要一个 struct

So let's create that linked list. First we're gonna need a struct

typedef struct node
{
    char* name;
    struct node* next;
}
node;

好的,所以我们有一个字符串 name 和一个...等等...我从来没有听说过一种叫做 struct node 的数据类型.好吧,为了我们的方便,我 typedef 一个名为 node 的新数据类型",它也恰好是我们的 struct,称为 node>.

Alright so we have a string name and a... Wait a sec... I've never heard of a data type called a struct node. Well for our convenience I typedef a new "data type" called node that also happens to be our struct called node.

既然我们已经有了列表的节点,接下来我们需要什么?好吧,我们需要为我们的列表创建一个根",以便我们可以遍历它(稍后我将解释traverse的含义).所以让我们分配一个根.(请记住我之前typdefed 的node 数据类型)

So now that we have our node for our list, what do we need next? Well we need to create a "root" to our list so we can traverse it (I'll explain what I mean by traverse later). So let's assign a root. (remember that node data type I typdefed earlier)

node* first = NULL;

现在我们有了 root,我们需要做的就是创建一个函数,将新用户名插入到我们的列表中.

So now that we have our root all we need to do is make a function to insert new usernames into our list.

/*
 * inserts a name called buffer into
 * our linked list
 */
void insert(char* buffer)
{     
    // try to instantiate node for number
    node* newptr = malloc(sizeof(node));
    if (newptr == NULL)
    {
        return;
    }

    // make a new ponter
    newptr->name = buffer;
    newptr->next = NULL;

    // check for empty list
    if (first == NULL)
    {
        first = newptr;
    }
    // check for insertion at tail
    else
    {
        // keep track of the previous spot in list
        node* predptr = first;

        // because we don't know how long this list is
        // we must induce a forever loop until we find the end
        while (true)
        {
            // check if it is the end of the list
            if (predptr->next == NULL)
            {
                // add new node to end of list
                predptr->next = newptr;

                // break out of forever loop
                break;
            }

            // update pointer
            predptr = predptr->next;
        }
    }         
}

就这样吧.我们有一个基本的链表,现在我们可以随心所欲地添加用户,而不必担心空间不足.但这确实有不利的一面.最大的问题是我们列表中的每个节点或用户"都是匿名的".我们不知道他们是否在,甚至我们有多少用户.(当然有办法让这个更好——我只想展示一个非常基本的链表)我们必须遍历整个链表来添加一个用户,因为我们不能直接访问最后.

So there you go. We have a basic linked list and now we can keep adding users all we want and we don't have to worry about running out of room. But this does come with down sides. The big problem with this is that every node or "user" in our list is "anonymous". We don't know were they are at or even how many users we have with this. (of course there are ways of making this much better -- I just want to show a very basic linked list) We have to traverse the entire list to add a user because we cannot access the end directly.

这就像我们在一场巨大的沙尘暴中,你什么也看不见,我们需要去我们的谷仓.我们看不到我们的谷仓在哪里,但我们有一个解决方案.有人站在我们的那里(我们的 nodes),他们都拿着两条绳子(我们的指针).每个人只拥有一根绳子,但绳子的另一端被其他人握着.就像我们的 struct 一样,绳索充当指向它们所在位置的指针.那么我们怎么去我们的谷仓呢?(对于这个例子,谷仓是列表中的最后一个人").好吧,我们不知道我们的人员有多大,也不知道他们去了哪里.事实上,我们所看到的只是一根系着绳子的栅栏柱.(我们的根!)那个栅栏柱永远不会改变,所以我们可以抓住柱子并开始前进,直到我们看到我们的第一个人.那个人拿着两条绳子(帖子的指针和他们的指针).

It's like we are in a huge dust storm and you can't see anything and we need to get to our barn. We can't see where our barn is but we have a solution. There are people standing our there (our nodes) and they are all holding two ropes (our pointers). Each person only owns one rope but that rope is being held at the other end by someone else. Just like our struct, the rope acts as a pointer to where they are. So how do we get to our barn? (for this example the barn is the last "person" in the list). Well we have no idea how big our line of people are or where they go. In fact, all we see is a fence post with a rope tied to it. (Our root!) that fence post will never change so we can grab the post and start moving along until we see our first person. That person is holding two ropes (the post's pointer and their pointer).

所以我们继续沿着绳索前进,直到我们遇到一个新人并抓住他们的绳索.最终,我们走到尽头,找到了我们的谷仓!

So we keep traveling along the rope until we reach a new person and grab onto their rope. Eventually, we get to the end and find our barn!

简而言之,这就是一个链表.它的好处是它可以随心所欲地扩展,但它的运行时间取决于列表的大小,即 O(n).因此,如果有 100 万用户,则必须运行 100 万次才能插入新名称!哇,仅仅插入 1 个名字似乎真的很浪费.

So that is a linked list in a nutshell. Its benefits are that it can expand as much as you want but its runtime depends on how big the list is, namely O(n). So if there are 1 million users, it would have to run 1 million times to insert a new name! Wow that seems really wasteful just to insert 1 name.

幸运的是,我们很聪明,可以创建更好的解决方案.我们为什么不,而不是只有一个链表,而是有几个链表.一个链表数组,如果你愿意的话.为什么我们不创建一个大小为 26 的数组.这样我们就可以为字母表中的每个字母创建一个唯一的链表.现在而不是 n 的运行时间.我们可以合理地说,我们的新运行时间将是 n/26.现在,如果您有一个 100 万大的列表,那根本就没有太大区别.但是对于这个例子,我们只是要保持简单.

Luckily, we are clever and can create a better solution. Why don't we, instead of having just one linked list, have a few linked lists. An array of linked lists if you will. Why don't we make an array of size 26. So we can have a unique linked list for every letter of the alphabet. Now instead of a run time of n. We can reasonably say that our new run time will be n/26. Now that won't make much of a difference at all if you have a list 1 million big. But we're just going to keep it simple for this example.

所以我们有一个链表数组,但是我们如何将我们的用户分类到数组中.嗯...为什么我们不做一个功能来决定哪个用户应该去哪里.如果您将用户放入数组或表",则此函数将散列"用户.所以让我们创建这个散列"链表.因此名称哈希表

So we have an array of linked lists but how are we going to sort our users into the array. Well... why don't we make a function that decides which user should go where. This function will "hash" the users if you will into an array or "table". So let's create this "hashed" linked list. Thus the name hash table

哈希表

正如我刚才所说的,我们的哈希表将是一个链表数组,并会按照用户名的第一个字母进行哈希处理.A 将转到位置 0,B 将转到 1,依此类推.

As I just said, our hash table will be an array of linked lists and will be hashed by the first letter of their username. A will go to position 0, B to 1, and so on.

这个哈希表的struct将与我们之前链表的结构相同

The struct for the this hash table will be the same as the struct for our previous linked list

typedef struct node
{
    char* name;
    struct node* next;
}
node;

现在就像我们的链表一样,我们的哈希表需要一个根

Now just like our linked list, we need a root for our hash table

node* first[26] = {NULL};

根将是一个字母大小的数组,其中的所有位置都将初始化为NULL.(请记住:链表中的最后一个元素必须始终指向 NULL 否则我们不会知道它是结尾)

The root will be an array the size of the alphabet and all positions in it will be initialized to NULL. (Remember: the last element in a linked list always has to point to NULL or else we wouldn't know it was the end)

让我们创建一个主函数.它需要一个我们要散列然后插入的用户名.

Let's make a main function. It takes a username we are going to hash then insert.

int main(char* name)
{
    // hash the name into a spot
    int hashedValue = hash(name);

    // insert the name in table with hashed value
    insert(hashedValue, name);
}

这是我们的哈希函数.这很简单.我们要做的就是查看单词的第一个字母,并根据它是什么字母给出一个 0 - 25 的值

So here's our hash function. It's pretty simple. All we want to do is look at the first letter in the word and give a value from 0 - 25 based on what letter it is

/*
 * takes a string and hashes it into the correct bucket
 */
int hash(const char* buffer)
{
    // assign a number to the first char of buffer from 0-25
    return tolower(buffer[0]) - 'a';
}

所以现在我们需要的是创建我们的插入函数.它看起来就像我们之前的插入函数一样,除了每次我们引用根时,我们都将它作为数组引用.

So now all we need is to create our insert function. It's going to look just like our insert function before except every time we reference our root, we're going to reference it as an array.

/*
 * takes a string and inserts it into a linked list at a part of the hash table
 */
void insert(int key, const char* buffer)
{
    // try to instantiate node to insert word
    node* newptr = malloc(sizeof(node));
    if (newptr == NULL)
    {
        return;
    }

    // make a new pointer
    strcpy(newptr->name, buffer);
    newptr->next = NULL;

    // check for empty list
    if (first[key] == NULL)
    {
       first[key] = newptr;
    }
    // check for insertion at tail
    else
    {
        node* predptr = first[key];
        while (true)
        {
            // insert at tail
            if (predptr->next == NULL)
            {
                predptr->next = newptr;
                break;
            }

            // update pointer
            predptr = predptr->next;
        }
    }
}

这是哈希表的基础知识.如果您知道如何使用指针和结构,这将非常简单.我知道这是一个非常简单的哈希表示例,只有一个插入函数,但是您可以将它做得更好,并通过哈希函数获得更多创意.您还可以根据需要使数组尽可能大,甚至可以使用多维数组.

So that's basics of a hash table. It's pretty simple if you know how to use pointers and structs. I know that was a pretty simple example of a hash table with only an insert function, but you can make it a lot better and get more creative with your hashing function. You can also make the array as big as you want or even a use multi-dimensional array.

这篇关于什么是哈希表以及如何在 C 中创建它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆