哈希表真的可以是 O(1) 吗? [英] Can hash tables really be O(1)?

查看:21
本文介绍了哈希表真的可以是 O(1) 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

哈希表可以达到 O(1) 似乎是常识,但这对我来说从来没有意义.有人可以解释一下吗?以下是我想到的两种情况:

It seems to be common knowledge that hash tables can achieve O(1), but that has never made sense to me. Can someone please explain it? Here are two situations that come to mind:

A.该值是一个比哈希表大小小的整数.因此,该值是它自己的哈希值,所以没有哈希表.但如果有,那将是 O(1) 并且仍然效率低下.

A. The value is an int smaller than the size of the hash table. Therefore, the value is its own hash, so there is no hash table. But if there was, it would be O(1) and still be inefficient.

B.您必须计算该值的哈希值.在这种情况下,要查找的数据大小的顺序是 O(n).在你做 O(n) 的工作之后,查找可能是 O(1),但在我看来这仍然是 O(n).

B. You have to calculate a hash of the value. In this situation, the order is O(n) for the size of the data being looked up. The lookup might be O(1) after you do O(n) work, but that still comes out to O(n) in my eyes.

除非你有一个完美的散列或一个大的散列表,否则每个桶可能有几个项目.所以,无论如何它都会在某个时候演变成一个小的线性搜索.

And unless you have a perfect hash or a large hash table, there are probably several items per bucket. So, it devolves into a small linear search at some point anyway.

我认为哈希表很棒,但我没有得到 O(1) 的指定,除非它只是理论上的.

I think hash tables are awesome, but I do not get the O(1) designation unless it is just supposed to be theoretical.

维基百科的关于哈希表的文章 始终如一地引用恒定的查找时间并完全忽略哈希的成本功能.这真的是一个公平的衡量标准吗?

Wikipedia's article for hash tables consistently references constant lookup time and totally ignores the cost of the hash function. Is that really a fair measure?

总结我学到的:

  • 这在技术上是正确的,因为哈希函数不需要使用键中的所有信息,因此可以是恒定时间,并且因为足够大的表可以将冲突减少到接近恒定时间.

  • It is technically true because the hash function is not required to use all the information in the key and so could be constant time, and because a large enough table can bring collisions down to near constant time.

这在实践中是正确的,因为随着时间的推移,只要选择散列函数和表大小以尽量减少冲突,它就会起作用,即使这通常意味着不使用恒定时间散列函数.

It is true in practice because over time it just works out as long as the hash function and table size are chosen to minimize collisions, even though that often means not using a constant time hash function.

推荐答案

这里有两个变量,m 和 n,其中 m 是输入的长度,n 是散列中的项目数.

You have two variables here, m and n, where m is the length of the input and n is the number of items in the hash.

O(1) 查找性能声明至少有两个假设:

The O(1) lookup performance claim makes at least two assumptions:

  • 您的对象可以在 O(1) 时间内进行相等比较.
  • 很少有哈希冲突.

如果您的对象大小可变,并且相等性检查需要查看所有位,那么性能将变为 O(m).然而,散列函数不必是 O(m)——它可以是 O(1).与加密散列不同,用于字典的散列函数不必查看输入中的每一位来计算散列.实现可以自由地只查看固定数量的位.

If your objects are variable size and an equality check requires looking at all bits then performance will become O(m). The hash function however does not have to be O(m) - it can be O(1). Unlike a cryptographic hash, a hash function for use in a dictionary does not have to look at every bit in the input in order to calculate the hash. Implementations are free to look at only a fixed number of bits.

对于足够多的项目,项目的数量将变得大于可能的散列数量,然后您将发生冲突,导致性能上升到 O(1) 以上,例如 O(n) 用于简单的链表遍历(或O(n*m) 如果两个假设都不成立.

For sufficiently many items the number of items will become greater than the number of possible hashes and then you will get collisions causing the performance rise above O(1), for example O(n) for a simple linked list traversal (or O(n*m) if both assumptions are false).

在实践中,虽然 O(1) 声明在技术上是错误的,但对于许多现实世界的情况,尤其是上述假设成立的情况,大约是正确的.

In practice though the O(1) claim while technically false, is approximately true for many real world situations, and in particular those situations where the above assumptions hold.

这篇关于哈希表真的可以是 O(1) 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆