什么时候使用哈希表? [英] When to use hash tables?

查看:555
本文介绍了什么时候使用哈希表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用哈希表可以提高性能的情况是什么?什么情况下不适用哈希表?

What are the cases when using hash table can improve performance, and when it does not? and what are the cases when using hash tables are not applicable?

推荐答案

在什么情况下使用哈希表可以提高性能?在什么情况下不能提高性能?

What are the cases when using hash table can improve performance, and when it does not?

如果您有需要照顾的地方,请使用哈希表和您考虑的其他任何方式来实现,将您的实际数据放入其中,并评估哪个性能更好.

If you have reason to care, implement using hash tables and whatever else you're considering, put your actual data through, and measure which performs better.

也就是说,如果哈希表具有您需要的操作(即,您不希望按已排序的顺序对其进行迭代,或者不希望将其与另一个哈希表进行快速比较),并且具有数百万甚至更多(数十亿,万亿). .)元素,那么它可能是您的最佳选择,但是很大程度上取决于哈希表的实现(尤其是选择封闭式哈希还是开放式哈希),对象大小,哈希函数质量和计算成本/运行时间),比较成本,不同缓存级别下计算机内存性能的差异……简而言之:在重要的时候,太多的事情甚至比不进行测量也能做出有根据的猜测是更好的选择.

That said, if the hash tables has the operations you need (i.e. you're not expecting to iterate it in sorted order, or compare it quickly to another hash table), and has millions or more (billions, trillions...) of elements, then it'll probably be your best choice, but a lot depends on the hash table implementation (especially the choice of closed vs. open hashing), object size, hash function quality and calculation cost / runtime), comparison cost, oddities of your computers memory performance at different cache levels... in short: too many things to make even an educated guess a better choice than measuring, when it matters.

什么情况下不适用使用哈希表?

and what are the cases when using hash tables are not applicable?

主要在以下时间:

  • 不能对输入进行哈希处理(例如,您获得了二进制blob,并且不知道其中的哪些位是有效的,但是您确实具有可以用于std::mapint cmp(const T&, const T&)函数) )或

可用/可能的哈希函数非常容易发生冲突,或者

the available/possible hash functions are very collision prone, or

您要避免出现以下情况下的最坏情况对性能的影响:

you want to avoid worst-case performance hits for:

  • 处理许多哈希冲突元素(可能是由试图崩溃或降低软件速度的人设计"的)

  • handling lots of hash-colliding elements (perhaps "engineered" by someone trying to crash or slow down your software)

调整哈希表的大小:除非预先调整大小足够大(如果使用过多的内存,这可能很浪费并且很慢),否则大多数实现都会不时增加它们用于哈希表的数组,然后分配更大的数组并复制内容:这样可以使导致重新哈希的特定插入比正常的O(1)行为慢得多,即使平均值仍为O(1);如果您在所有情况下都需要更一致的行为,那么可以使用平衡二叉树之类的方法

resizing the hash table: unless presized to be large enough (which can be wasteful and slow when excessive memory's used), the majority of implementations will outgrow the arrays they're using for the hash table every now and then, then allocate a bigger array and copy content across: this can make the specific insertions that cause this rehashing to be much slower than the normal O(1) behaviour, even though the average is still O(1); if you need more consistent behaviour in all cases, something like a balance binary tree may serve

您的访问模式非常专业(例如,经常对按某些特定排序顺序在附近"的键的元素进行操作),这样对于将它们保持在内存附近的其他存储模型,缓存效率会更高.桶排序的元素),即使您不完全依赖排序顺序,例如迭代

your access patterns are quite specialised (e.g. frequently operating on elements with keys that are "nearby" in some specific sort order), such that cache efficiency is better for other storage models that keep them nearby in memory (e.g. bucket sorted elements), even if you're not exactly relying on the sort order for e.g. iteration

这篇关于什么时候使用哈希表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆