哈希表如何工作? [英] How does a hash table work?

查看:160
本文介绍了哈希表如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个哈希表如何工作的解释 - 简单的英语为像我这样的simpleton!



例如,我知道它需要键,计算哈希(我正在寻找一个解释如何),然后执行某种模数来解决它在哪里



有没有人可以澄清这个过程?



编辑:我没有具体说明如何计算哈希码,而是概括了散列表的工作原理。

解决方案

这是以外行人士的说法解释。



我们假设你想填写一本图书馆,而不只是把它们填在那里,你想要能够在需要时再次轻松找到它们。



所以,你决定如果想读一本书的人知道这本书的标题和准确的引导标题,那就是它应该采取的一切。在标题中,该人员在图书管理员的帮助下,应该可以轻松,快速地找到这本书。



那么,你怎么能这样做?那么很明显,你可以保留一些你把每本书放在哪里的列表,但是你和搜索图书馆一样有问题,你需要搜索列表。确定列表会更小,更容易搜索,但是您仍然不想从图书馆(或列表)的一端依次搜索到另一个。



你想要的东西,与书的标题,可以给你一个正确的地方,所以你只需要走到正确的架子,拿起这本书。



但是怎么做呢?那么,当你填写这个图书馆的时候,有一点想法,当你填写图书馆的时候,很多工作。



而不是刚开始填写图书馆一方面,另一方面,你设计一个聪明的小方法。你拿这本书的标题,通过一个小型的计算机程序运行它,它在这个架子上吐出一个货架号码和一个插槽号码。这是你放置这本书的地方。



这个程序的美丽在于,当一个人回来阅读这本书的时候,你通过再次收到与您最初给出的货架编号和插槽号相同的货架编号,这是本书所在的位置。



程序与其他程序一样已经提到的,被称为哈希算法或哈希计算,通常可以通过采用其中输入的数据(在本例中为本书的标题)并计算出一个数字。



为了简单起见,我们假设它只是将每个字母和符号转换成一个数字,并将它们全部归结。在现实中,它比这更复杂,但现在让我们离开它。



这种算法的优点是,如果你输入相同的输入一次又一次地,每次都会继续吐出相同的数字。



好的,这样基本上哈希表如何工作。



技术资料如下。



首先,数字的大小。通常,这种哈希算法的输出在一些大的范围内,通常比您在表中的空间大得多。例如,假设我们在图书馆里有100万本书的空间。哈希计算的输出可能在0到10亿的范围内,这是更高的。



那么我们该怎么办?我们使用一些叫做模数计算的东西,它基本上说如果你算到你想要的数字(即十亿个数字),但是要保持在一个更小的范围内,每次你达到那个较小范围的极限,你就开始回到0,但是你必须跟踪你在大序列中有多远。



说哈希算法的输出范围是0到20,你得到一个特定的标题值17。如果图书馆的大小只有7本书,你算1,2,3,4,5,6,当你达到7,你从0开始。因为我们需要计数17次,我们有1, 2,3,4,5,6,0,1,2,3,4,5,6,0,1,2,3,最后的数字是3。


$ b $当然,模数计算不是这样做的,它是用除法和余数来完成的。除以17的剩余部分是3(7次,在14次,17次17次,2次,17次和14次之间的差值为3次)。



在插槽号3中。



这导致下一个问题。碰撞。由于算法无法空出书籍,以便他们完全填写库(或者哈希表,如果你愿意),它将总是计算一个以前使用过的数字。在图书馆的意义上,当你到达书架和你想放入一本书的插槽号码时,已经有一本书了。



存在各种冲突处理方法,包括将数据运行到另一个计算中,以获得表中的另一个位置(双重散列),或只是找到一个靠近你的空间的空间(即在前一本书的旁边,假设插槽可用,也称为线性探测)。这意味着你尝试在稍后找到这本书的时候会有一些挖掘作用,但是比起图书馆一端更好一些。



最后,在某些时候,您可能希望将更多的书籍放入图书馆,而不是图书馆允许。换句话说,您需要构建一个更大的库。由于图书馆的确切位置是使用图书馆的确切尺寸和当前尺寸进行计算的,因此,如果您调整图书馆大小,则可能需要为所有图书找到新的地点,因为计算完成后才能找到图书馆已经改变了。



我希望这个解释比水桶和功能有点多:)


I'm looking for an explanation of how a hashtable works - in plain English for a simpleton like me!

For example, I know it takes the key, calculates the hash (I am looking for an explanation how) and then performs some kind of modulo to work out where it lies in the array where the value is stored, but that's where my knowledge stops.

Could anyone clarify the process?

Edit: I'm not looking specifically about how hashcodes are calculated, but a general overview of how a hashtable works.

解决方案

Here's an explanation in layman's terms.

Let's assume you want to fill up a library of books and not just stuff them in there, but you want to be able to easily find them again when you need them.

So, you decide that if the person that wants to read a book knows the title of the book and the exact title to boot, then that's all it should take. With the title, the person, with the aid of the librarian, should be able to find the book easily and quickly.

So, how can you do that? Well, obviously you can keep some kind of list of where you put each book, but then you have the same problem as searching the library, you need to search the list. Granted, the list would be smaller and easier to search, but still you don't want to search sequentially from one end of the library (or list) to the other.

You want something that, with the title of the book, can give you the right spot at once, so all you have to do is just stroll over to the right shelf, and pick up the book.

But how can that be done? Well, with a bit of forethought when you fill up the library and a lot of work when you fill up the library.

Instead of just starting to fill up the library from one end to the other, you devise a clever little method. You take the title of the book, run it through a small computer program, which spits out a shelf number and a slot number on that shelf. This is where you place the book.

The beauty of this program is that later on, when a person comes back in to read the book, you feed the title through the program once more, and get back the same shelf number and slot number that you were originally given, and this is where the book is located.

The program, as others have already mentioned, is called a hash algorithm or hash computation and usually works by taking the data fed into it (the title of the book in this case) and calculates a number from it.

For simplicity, let's say that it just converts each letter and symbol into a number and sums them all up. In reality, it's a lot more complicated than that, but let's leave it at that for now.

The beauty of such an algorithm is that if you feed the same input into it again and again, it will keep spitting out the same number each time.

Ok, so that's basically how a hash table works.

Technical stuff follows.

First, there's the size of the number. Usually, the output of such a hash algorithm is inside a range of some large number, typically much larger than the space you have in your table. For instance, let's say that we have room for exactly one million books in the library. The output of the hash calculation could be in the range of 0 to one billion which is a lot higher.

So, what do we do? We use something called modulus calculation, which basically says that if you counted to the number you wanted (i.e. the one billion number) but wanted to stay inside a much smaller range, each time you hit the limit of that smaller range you started back at 0, but you have to keep track of how far in the big sequence you've come.

Say that the output of the hash algorithm is in the range of 0 to 20 and you get the value 17 from a particular title. If the size of the library is only 7 books, you count 1, 2, 3, 4, 5, 6, and when you get to 7, you start back at 0. Since we need to count 17 times, we have 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, and the final number is 3.

Of course modulus calculation isn't done like that, it's done with division and a remainder. The remainder of dividing 17 by 7 is 3 (7 goes 2 times into 17 at 14 and the difference between 17 and 14 is 3).

Thus, you put the book in slot number 3.

This leads to the next problem. Collisions. Since the algorithm has no way to space out the books so that they fill the library exactly (or the hash table if you will), it will invariably end up calculating a number that has been used before. In the library sense, when you get to the shelf and the slot number you wish to put a book in, there's already a book there.

Various collision handling methods exist, including running the data into yet another calculation to get another spot in the table (double hashing), or simply to find a space close to the one you were given (i.e. right next to the previous book assuming the slot was available also known as linear probing). This would mean that you have some digging to do when you try to find the book later, but it's still better than simply starting at one end of the library.

Finally, at some point, you might want to put more books into the library than the library allows. In other words, you need to build a bigger library. Since the exact spot in the library was calculated using the exact and current size of the library, it goes to follow that if you resize the library you might end up having to find new spots for all the books since the calculation done to find their spots has changed.

I hope this explanation was a bit more down to earth than buckets and functions :)

这篇关于哈希表如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆