什么是散列表和hashmaps及其典型用例? [英] What are hashtables and hashmaps and their typical use cases?

查看:189
本文介绍了什么是散列表和hashmaps及其典型用例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近碰到过这些术语很少,但我很困惑他们是如何工作的,以及他们通常在什么时候实现的? 解决方案

如果你使用一个数组,一个简单的基于索引的数据结构,并用随机的东西来填充它,找到一个特定的因为你基本上必须从一端开始搜索到另一端,直到找到你想要的那个为止。进入成为一个越来越昂贵的操作,因为你用数据填充它。



如果您希望更快地访问数据,您通常会对数组进行排序并使用二进制搜索。然而,在增加查找现有值的速度的同时,这会使插入新值变慢,因为当需要在中间插入元素时,需要移动现有元素。



另一方面,散列表具有一个相关函数,它接受一个条目并将其减少为一个散列键。然后这个数字被用作数组中的索引,这就是存储条目的地方。

散列表围绕一个数组开始,它最初是空的。空并不意味着长度为零,数组始于一个大小,但数组中的所有元素均不包含任何内容。



每个元素都有两个属性,数据和一个标识数据的关键字。例如,美国的邮政编码列表将是邮政编码 - >名称类型的关联。该函数减少了键,但不考虑数据。



因此,当您向哈希表中插入某些内容时,该函数会将该键减少为一个数字,该数字用作一个指向这个(空)数组的索引,这就是你存储数据的地方,包括关键字和相关的数据。



然后,你想找到一个你知道键的特定条目,所以你通过相同的函数运行键,获得它的散列键,然后到散列表中的特定位置并在那里检索数据。



理论认为,将密钥减少为散列密钥的函数在计算上比线性搜索便宜得多。

A典型的哈希表没有无限数量的可用于存储的元素,因此该数字通常会进一步降低到适合数组大小的索引。一种方法是简单地将索引的模数与数组大小进行比较。对于大小为10的数组,索引0-9将直接映射到索引,索引10-19将再次映射到0-9,依此类推。



一些键将被缩减为与散列表中现有条目相同的索引。在这一点上,直接比较实际的键和与比较键的数据类型(例如,正常字符串比较)相关的所有规则。如果完全匹配,则可以忽略新数据(它已存在)或覆盖(替换该键的旧数据),或者添加它(多值散列表)。如果没有匹配,这意味着虽然哈希键是相同的,但实际的键不是,您通常会找到一个新位置来存储该键+数据。



碰撞解析有许多实现,最简单的就是去数组中的下一个空元素。这个简单的解决方案还有其他问题,所以找到正确的解析算法也是散列表的一个很好的练习。



如果完全填满(或关闭to),这通常是通过创建新大小的新数组,并再次计算所有索引,并将这些项目放入新数组中的新数组中来完成的。



将键减少为数字的函数不会生成线性值,即。 AAA变为1,那么AAB变成2,所以哈希表不会被任何典型值排序。



有一个关于这个主题的好的维基百科文章,好吧,此处


I have recently run across these terms few times but I am quite confused how they work and when they are usualy implemented?

解决方案

Well, think of it this way.

If you use an array, a simple index-based data structure, and fill it up with random stuff, finding a particular entry gets to be a more and more expensive operation as you fill it with data, since you basically have to start searching from one end toward the other, until you find the one you want.

If you want to get faster access to data, you typicall resort to sorting the array and using a binary search. This, however, while increasing the speed of looking up an existing value, makes inserting new values slow, as you need to move existing elements around when you need to insert an element in the middle.

A hashtable, on the other hand, has an associated function that takes an entry, and reduces it to a number, a hash-key. This number is then used as an index into the array, and this is where you store the entry.

A hashtable revolves around an array, which initially starts out empty. Empty does not mean zero length, the array starts out with a size, but all the elements in the array contains nothing.

Each element has two properties, data, and a key that identifies the data. For instance, a list of zip-codes of the US would be a zip-code -> name type of association. The function reduces the key, but does not consider the data.

So when you insert something into the hashtable, the function reduces the key to a number, which is used as an index into this (empty) array, and this is where you store the data, both the key, and the associated data.

Then, later, you want to find a particular entry that you know the key for, so you run the key through the same function, get its hash-key, and goes to that particular place in the hashtable and retrieves the data there.

The theory goes that the function that reduces your key to a hash-key, that number, is computationally much cheaper than the linear search.

A typical hashtable does not have an infinite number of elements available for storage, so the number is typically reduced further down to an index which fits into the size of the array. One way to do this is to simply take the modulus of the index compared to the size of the array. For an array with a size of 10, index 0-9 will map directly to an index, and index 10-19 will map down to 0-9 again, and so on.

Some keys will be reduced to the same index as an existing entry in the hashtable. At this point the actual keys are compared directly, with all the rules associated with comparing the data types of the key (ie. normal string comparison for instance). If there is a complete match, you either disregard the new data (it already exists) or you overwrite (you replace the old data for that key), or you add it (multi-valued hashtable). If there is no match, which means that though the hash keys was identical, the actual keys were not, you typically find a new location to store that key+data in.

Collision resolution has many implementations, and the simplest one is to just go to the next empty element in the array. This simple solution has other problems though, so finding the right resolution algorithm is also a good excercise for hashtables.

Hashtables can also grow, if they fill up completely (or close to), and this is usually done by creating a new array of the new size, and calculating all the indexes once more, and placing the items into the new array in their new locations.

The function that reduces the key to a number does not produce a linear value, ie. "AAA" becomes 1, then "AAB" becomes 2, so the hashtable is not sorted by any typical value.

There is a good wikipedia article available on the subject as well, here.

这篇关于什么是散列表和hashmaps及其典型用例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆