哈希表是如何工作的? [英] How does a hash table work?

查看:18
本文介绍了哈希表是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找有关哈希表如何工作的解释 - 为像我这样的傻瓜用简单的英语!

I'm looking for an explanation of how a hash table works - in plain English for a simpleton like me!

例如,我知道它需要密钥,计算散列(我正在寻找如何解释),然后执行某种模以计算出它在存储值的数组中的位置,但这就是我的知识停止了.

For example, I know it takes the key, calculates the hash (I am looking for an explanation how) and then performs some kind of modulo to work out where it lies in the array where the value is stored, but that's where my knowledge stops.

谁能解释一下这个过程?

Could anyone clarify the process?

我不是专门询问哈希码的计算方式,而是对哈希表如何工作的一般概述.

I'm not asking specifically about how hash codes are calculated, but a general overview of how a hash table works.

推荐答案

以下是通俗易懂的解释.

Here's an explanation in layman's terms.

假设您想用书填满图书馆,而不仅仅是将它们塞进图书馆,而且您希望能够在需要时轻松地再次找到它们.

Let's assume you want to fill up a library with books and not just stuff them in there, but you want to be able to easily find them again when you need them.

因此,您决定,如果想要阅读一本书的人知道书名和要引导的确切标题,那么这就是全部.有了书名,这个人在图书管理员的帮助下应该能够轻松快速地找到这本书.

So, you decide that if the person that wants to read a book knows the title of the book and the exact title to boot, then that's all it should take. With the title, the person, with the aid of the librarian, should be able to find the book easily and quickly.

那么,你怎么能做到这一点?嗯,显然你可以保留某种列表,列出每本书的放置位置,但是你会遇到与搜索图书馆相同的问题,你需要搜索列表.诚然,该列表会更小且更易于搜索,但您仍然不想从库(或列表)的一端到另一端按顺序搜索.

So, how can you do that? Well, obviously you can keep some kind of list of where you put each book, but then you have the same problem as searching the library, you need to search the list. Granted, the list would be smaller and easier to search, but still you don't want to search sequentially from one end of the library (or list) to the other.

您想要的东西,加上书名,可以立即为您提供正确的位置,所以您只需走到正确的书架上,然后拿起书.

You want something that, with the title of the book, can give you the right spot at once, so all you have to do is just stroll over to the right shelf, and pick up the book.

但是怎么做呢?好吧,当你填满图书馆时要深思熟虑,当你填满图书馆时需要做很多工作.

But how can that be done? Well, with a bit of forethought when you fill up the library and a lot of work when you fill up the library.

您不是刚开始从一端到另一端填满图书馆,而是设计了一种巧妙的小方法.您获取书名,通过一个小型计算机程序运行它,该程序会在该书架上输出一个书架编号和一个插槽编号.这是您放置书的地方.

Instead of just starting to fill up the library from one end to the other, you devise a clever little method. You take the title of the book, run it through a small computer program, which spits out a shelf number and a slot number on that shelf. This is where you place the book.

这个程序的美妙之处在于,当一个人回来阅读这本书时,您再次通过程序输入标题,并返回与您最初获得的相同的书架号和插槽号,这就是这本书所在的地方.

The beauty of this program is that later on, when a person comes back in to read the book, you feed the title through the program once more, and get back the same shelf number and slot number that you were originally given, and this is where the book is located.

正如其他人已经提到的,该程序被称为哈希算法或哈希计算,通常通过将数据输入其中(本例中为书名)并从中计算出一个数字来工作.

The program, as others have already mentioned, is called a hash algorithm or hash computation and usually works by taking the data fed into it (the title of the book in this case) and calculates a number from it.

为简单起见,我们假设它只是将每个字母和符号转换为一个数字并将它们相加.实际上,它比那要复杂得多,但我们暂时搁置它.

For simplicity, let's say that it just converts each letter and symbol into a number and sums them all up. In reality, it's a lot more complicated than that, but let's leave it at that for now.

这种算法的美妙之处在于,如果你一次又一次地向它输入相同的输入,它每次都会吐出相同的数字.

The beauty of such an algorithm is that if you feed the same input into it again and again, it will keep spitting out the same number each time.

好的,这就是哈希表的基本工作原理.

Ok, so that's basically how a hash table works.

技术内容如下.

首先,是数字的大小.通常,这种散列算法的输出在某个大数的范围内,通常比表中的空间大得多.例如,假设我们的图书馆正好有 100 万本书的空间.哈希计算的输出可能在 0 到 10 亿之间,这要高得多.

First, there's the size of the number. Usually, the output of such a hash algorithm is inside a range of some large number, typically much larger than the space you have in your table. For instance, let's say that we have room for exactly one million books in the library. The output of the hash calculation could be in the range of 0 to one billion which is a lot higher.

那么,我们该怎么办?我们使用一种叫做模数计算的东西,它基本上是说,如果你数到你想要的数字(即十亿个数字)但想要保持在一个更小的范围内,每次你达到那个更小的范围的限制时,你就会重新开始0,但你必须跟踪你在大序列中走了多远.

So, what do we do? We use something called modulus calculation, which basically says that if you counted to the number you wanted (i.e. the one billion number) but wanted to stay inside a much smaller range, each time you hit the limit of that smaller range you started back at 0, but you have to keep track of how far in the big sequence you've come.

假设哈希算法的输出在 0 到 20 的范围内,并且您从特定标题中获得值 17.如果图书馆的大小只有 7 本书,你数 1、2、3、4、5、6,当你数到 7 时,你从 0 开始.因为我们需要数 17 次,所以我们有 1,2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3,最后的数是3.

Say that the output of the hash algorithm is in the range of 0 to 20 and you get the value 17 from a particular title. If the size of the library is only 7 books, you count 1, 2, 3, 4, 5, 6, and when you get to 7, you start back at 0. Since we need to count 17 times, we have 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, and the final number is 3.

当然模数计算不是那样做的,它是用除法和余数来完成的.17 除以 7 的余数是 3(7 在 14 处 2 次变成 17,17 和 14 的差是 3).

Of course modulus calculation isn't done like that, it's done with division and a remainder. The remainder of dividing 17 by 7 is 3 (7 goes 2 times into 17 at 14 and the difference between 17 and 14 is 3).

因此,您将这本书放在第 3 个插槽中.

Thus, you put the book in slot number 3.

这会导致下一个问题.碰撞.由于该算法无法将书籍隔开以便它们完全填满图书馆(或散列表,如果你愿意的话),它最终总是会计算一个以前使用过的数字.在图书馆的意义上,当您到达书架和要放入书的位置编号时,那里已经有一本书了.

This leads to the next problem. Collisions. Since the algorithm has no way to space out the books so that they fill the library exactly (or the hash table if you will), it will invariably end up calculating a number that has been used before. In the library sense, when you get to the shelf and the slot number you wish to put a book in, there's already a book there.

存在各种冲突处理方法,包括将数据运行到另一个计算中以获得表中的另一个位置(双重散列),或者只是找到一个与您获得的空间接近的空间(即,假设该插槽可用,则紧挨着上一本书,也称为线性探测).这意味着当您稍后尝试查找这本书时,您需要进行一些挖掘工作,但这仍然比简单地从图书馆的一端开始要好.

Various collision handling methods exist, including running the data into yet another calculation to get another spot in the table (double hashing), or simply to find a space close to the one you were given (i.e. right next to the previous book assuming the slot was available also known as linear probing). This would mean that you have some digging to do when you try to find the book later, but it's still better than simply starting at one end of the library.

最后,在某些时候,您可能希望将比图书馆允许的更多的书籍放入图书馆.换句话说,您需要构建一个更大的库.由于图书馆中的确切位置是使用图书馆的确切和当前大小计算的,因此如果您调整图书馆的大小,您可能最终不得不为所有书籍找到新位置,因为计算已完成以找到它们的位置改变了.

Finally, at some point, you might want to put more books into the library than the library allows. In other words, you need to build a bigger library. Since the exact spot in the library was calculated using the exact and current size of the library, it goes to follow that if you resize the library you might end up having to find new spots for all the books since the calculation done to find their spots has changed.

我希望这个解释比桶和函数更接地气:)

I hope this explanation was a bit more down to earth than buckets and functions :)

这篇关于哈希表是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆