Java中的哈希 - 结构&访问时间 [英] hashing in Java -- structure & access time

查看:97
本文介绍了Java中的哈希 - 结构&访问时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找两个不同但相关论点的验证 - 上面的(A)和以下(B)第一行 - 在Q中评论。


$ b (A) HashMap 的结构是:

a HashMap 是一个普通表。这就是直接内存访问(DMA)。

HashMap (或一般散列)在第一个地方
背后的整个想法是投入使用这个恒定时间a。)通过它们自己的数据内容(< K,V>),
访问记录而不是它们在DMA中的位置(表格索引)


b。)管理可变数量的记录 - 许多不是给定大小的
记录,并且可能/不保持不变
在整个使用这种结构的大小。


所以,Java Hash中的整体结构是:

一个表: table //我使用 HashMap

中使用的标识符

此表的每个单元格都是<强> 即可。

每个 存储桶 都是 Entry 类型的链接列表 -
,即该链接列表的每个节点(不是Java / API的链接列表,而是数据结构)是 Entry 类型,它又是< K,V>对。


当新对加入到散列中时,计算
a独特 hashCode 对于这个< K,V>对。
这个 hashCode 是这个< K,V>在 table 中 - 它告诉
哪个桶是这个< K,V>将进入散列。
注意:通过函数 hash()(在 HashMap 中), hashCode
以更好地适应 表格 的当前长度。 indexFor()也在使用
来确定哪个存储桶,即表格的单元< K,V>将进入。

当桶被确定时,< K,V>被添加到该桶中的链表的开始 - 结果,它是第一个< K,V>条目,并且已经存在的链接列表的第一个条目现在是
这个新添加的条目所指向的下一个条目。



// ================================= ==============================



(B )
从我在 HashMap 中看到的内容 表格 的大小 - 散列仅在基于
散列大小和容量的决定,这是当前和最大值。 #整个哈希中的条目。 b
$ b

没有重新组织或调整个别存储桶的大小 - 例如当存储桶中的最大#个条目超过此等时调整大小()。

这是不大可能的,但有可能大量的条目可能会在一个桶中膨胀,而其余的散列几乎是空的。如果是这种情况,即每个桶的大小没有上限,则哈希不是恒定的,而是线性访问 - 理论上来说,一件事是。它需要$ O(n)$ time才能获得散列中的条目,其中$ n $是条目的总数。但是,它不应该。



// ================================= ==============================



我不认为我错过了上面(A)部分的任何内容。 (b)我不完全确定(B)部分。这是一个重要的问题,我期待找出这个论点有多准确。



我正在寻找两个部分的验证。

预先感谢。



// ================================= ==============================



编辑:



最大桶大小是固定的,即只要
桶中的#entries达到最大值就可以解决它 - 访问时间很简单
在理论上和使用上都是不变的。

这不是一个结构良好,但速度很快的解决方案,并且为了持续访问而工作得很好。

散列码可能均匀地分布在整个桶中,并且它不可能是
,因此桶中的任何一个桶都会在点击整个散列大小的阈值。
这是HashMap当前设置使用的假设。



同样基于Peter Lawrey的讨论。

解决方案

HashMap中的冲突只是诸如拒绝服务攻击等病态情况下的一个问题。

在Java 7中,您可以更改散列策略,以便外部派对无法预测散列算法。

AFAIK,在Java 8中,String键的HashMap将使用树形图而不是链表作为冲突。这意味着O(ln N)最差的情况,而不是O(n)的访问时间。


I am looking for verification on two different but related arguments-- those above (A) and below (B) the first line line-comment here in the Q.

(A) The way HashMap is structured is:

a HashMap is a plain table. thats direct memory access (DMA).

The whole idea behind HashMap (or hashing in general) at the first place is to put into use this constant-time memory access for

a.) accessing records by their own data content (< K,V >), not by their locations in DMA (the table index)

b.) managing variable number of records-- a number of records not of a given size, and may/not remain constant in size throughout the use of this structure.

So, the overall structure in a Java Hash is:

a table: table // i`m using the identifier used in HashMap

each cell of this table is a bucket.

Each bucket is a linked list of type Entry-- i.e., each node of this linked list (not the linked list of Java/API, but the data structure) is of type Entry which in turn is a < K,V > pair.

When a new pair comes in to be added to the hash, a unique hashCode is calculated for this < K,V > pair. This hashCode is the key to the index of this < K,V > in table-- it tells which bucket this < K,V > will go in in the hash. Note: hashCode is "normalized" thru the function hash() (in HashMap for one) to better-fit the current length of the table. indexFor() is also at use to determine which bucket, i.e., cell of table the < K,V > will go in.

When the bucket is determined, the < K,V > is added to the beginning of the linked list in this bucket-- as a result, it is the first < K,V > entry in this bucket and the first entry of the linked-list-that-already-existed is now the "next" entry that is pointed by this newly added one.

//===============================================================

(B) From what I see in HashMap, the resizing of the table-- the hash is only done upon a decision based on hash size and capacity, which are the current and max. # entries in the entire hash.

There is no re-structuring or resizing upon individual bucket sizes-- like "resize() when the max.#entries in a bucket exceeds such&such".

It is not probable, but is possible that a significant number of entries may be bulked up in a bucket while the rest of the hash is pretty much empty.

If this is the case, i.e., no upper limit on the size of each bucket, hash is not of constant but linear access-- theoretically for one thing. It takes $O(n)$ time to get hold of an entry in hash where $n$ is the total number of entries. But then it shouldn't be.

//===============================================================

I don't think I'm missing anything in Part (A) above.

I'm not entirely sure of Part (B). It is a significant issue and I'm looking to find out how accurate this argument is.

I'm looking for verification on both parts.

Thanks in advance.

//===============================================================

EDIT:

Maximum bucket size being fixed, i.e., hash being restructured whenever the #entries in a bucket hits a maximum would resolve it-- the access time is plain constant in theory and in use.

This isn't a well structured but quick fix, and would work just fine for sake of constant access.

The hashCodes are likely to be evenly distributed throughout the buckets and it isn`t so likely that anyone of the buckets will hit the bucket-max before the threshold on the overall size of the hash is hit. This is the assumption the current setup of HashMap is using as well.

Also based on Peter Lawrey`s discussion below.

解决方案

Collisions in HashMap are only a problem in pathological cases such as denial of service attacks.

In Java 7, you can change the hashing strategy such that an external party cannot predict your hashing algo.

AFAIK, In Java 8 HashMap for a String key will use a tree map instead of a linked list for collisions. This means O(ln N) worst case instead of O(n) access times.

这篇关于Java中的哈希 - 结构&amp;访问时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆