带有逻辑删除的哈希表的加载因子 [英] Load factor of hash tables with tombstones
问题描述
因此出现了一个问题,即在计算哈希表的负载因子时是否应包括墓碑.
So the question came up about whether tombstones should be included when calculating the load factor of a hash table.
我认为,鉴于使用负载系数来确定何时扩展容量,因此不应该包括墓碑.一个明显的例子是,您几乎要填充然后删除哈希表中的每个值.在这里插入非常容易(没有冲突),因此我认为负载因子不应该包括它们.
I thought that, given that the load factor is used to determine when to expand capacity, tombstones should not be included. An obvious example is if you almost fill and then remove every value in a hash table. Here insertions are super easy (no collisions) so I believe the load factor shouldn't include them.
但是您可以查看一下,并认为对于所有墓碑,查找将很慢(可能会搜索几乎整个空间).
But you could look at this and think that with all the tombstones lookups will be slow (potentially searching almost the entire space).
所以我想问这个问题.哈希表的负载因子应在计算中包括墓碑吗?
So I thought I'd ask the question. Should the load factor of a hashtable include tombstones in the calculation?
推荐答案
负载因子不是哈希表数据结构的必要组成部分-它是为动态系统定义行为规则的方法(增长/收缩哈希表是一个动态系统).
Load factor is not an essential part of hash table data structure -- it is the way to define rules of behaviour for the dymamic system (growing/shrinking hash table is a dynamic system).
此外,在我看来,在95%的现代哈希表情况下,这种方法过于简化,动态系统的运行性能却欠佳.优点:
Moreover, in my opinion, in 95% of modern hash table cases this way is over simplified, dynamic systems behave suboptimally. It's advantages:
- 那么,理解和实施的简单性.
- 哈希表数据结构不应存储带有阈值的许多数字-可能只有一个数字.当哈希表非常小且标头的大小影响整个数据结构的内存效率(以字节为单位存储条目)时,这是有意义的.
- 在某些(常见)情况下:仅追加/更新哈希表,更复杂的行为模型退化为"just load factor"模型,换句话说,load factor模型定义了相对最佳的行为.
另请参阅我对负载系数模型的回答.我更喜欢 [最小负载,目标负载,最大负载] +生长因子框架模型.
如果您使用墓碑开发通用哈希表,我认为您可以拿起我的结果(如下).我可能花了数周的时间专门开发此模型.也许您可以进行一些改进或进一步研究,我会很高兴.
See also my answer on load factor model. I prefer [min load, target load, max load] + growth factor frame model.
If you develop general-purpose hash table with tombstones, I think you can just pick up my results (below). I spend maybe several weeks solely developing this model. Maybe you can make some improvements or further research, I would be glad.
有两个主要的哈希表动态行为模式作为目标:
Two main hash table dynamic behaviour patterns are targeted:
- 不断增长的哈希表(可能处于增长阶段),几乎没有删除或没有删除
- 未指定适当容量(或未知)时哈希表的初始填充
- growing hash table (maybe in growing phase), with little or no removals
- initial fill of hash table, when proper capacity was not specified (or unknown)
- 具有上限的高速缓存,LRU,带有条目的表已过期
定义了两个阈值:
-
最大大小(即有效条目数),
table size * max load
max size (i. e. number of alive entries),
table size * max load
最小空闲(即空的,没有活动入口或墓碑)插槽的数量,如果哈希表的大小超过最大大小,我们假设我们处于增长模式",请重新哈希表大小以能够存储
current size * growth factor
条目,即. e.选择最接近current size * growth factor / target load
的表格大小.If hash table size exceeds max size, we assume we are in the "growing pattern", rehash to the table size to be able to store
current size * growth factor
entries, i. e. choose table size closest possible tocurrent size * growth factor / target load
.如果可用插槽的数量小于最小可用插槽的数量,则我们处于缓存模式",将"hash"恢复为当前大小,即. e.到最接近
current size / target load
的表大小.If the number of free slots becomes below than min number of free slots, we are in "cache pattern", rehash "to the current size", i. e. to the table size closest possible to
current size / target load
.阅读此外,文章从哈希表:理论与实践阐明了一些观点.
Also, article Tombstones purge from hashtable: theory and practice sheds some light.
如果您开发专用哈希表,已知其动态特性是已知的(或可以研究的),那么我建议您开发适合自己情况的模型.不要依赖纯粹的数学和CS理论,而要在基准测试中评估模型.
If you develop specially purposed hash table, which dymanic properties are known (or could be studied), I recommend you to develop your own model, fitting your case. Don't rely on pure math and CS theory, evaluate your model in benchmarks.
这篇关于带有逻辑删除的哈希表的加载因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!