可扩展方式访问ConcurrentHashMap的每个元素< Element,Boolean>正好一次 [英] Scalable way to access every element of ConcurrentHashMap<Element, Boolean> exactly once

查看:117
本文介绍了可扩展方式访问ConcurrentHashMap的每个元素< Element,Boolean>正好一次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有32个机器线程和一个 ConcurrentHashMap< Key,Value> map ,其中包含大量的键。 定义了一个公共方法 visit()。我想使用我已经提供的处理能力和可能的某种线程池来 visit() map的每个元素一次。



我可以尝试:




  • 我可以使用方法 map.keys code>。可以使用 nextElement()来迭代结果 Enumeration< Key> ,但是由于调用 key.visit()很简短,我不会管理线程忙。枚举本质上是单线程的。

  • 我可以使用同步 HashSet< Key> ,而是调用方法 toArray()并将数组上的工作分割成所有32个线程。我非常怀疑这个解决方案,因为方法 toArray()可能是一个单线程瓶颈。

  • 尝试继承 ConcurrentHashMap ,获取其内部段< K,V> 的实例,尝试将它们分成32组并分别处理每组。这听起来像是一个硬核的方法虽然。

  • 或类似的魔法与枚举<键>



理想情况下:




  • 理想的是 ConcurrentHashMap& Value> 将定义一个方法 keysEnumerator(int approximatePosition),这可能会丢失一个大约缺少前1/32元素的枚举器, $ c> map.keysEnumerator(map.size()/ 32)



?以前有人遇到类似的问题吗?



EDIT



一个去剖析,看看这个问题是否真的会影响实际的性能。由于我现在没有访问集群,我使用我的笔记本电脑,并试图将结果推断为更大的数据集。在我的机器上,我可以创建一个2百万的键ConcurrentHashMap,它需要大约1秒来迭代它调用每个键上的 visit()方法。该程序应该扩展到8500万键(及以上)。集群的处理器稍快,但它仍然需要大约40秒来遍历整个地图。现在关于程序的逻辑流程几句话。所呈现的逻辑是顺序的,即,不允许任何线程进行到下一步,直到上一步中的所有线程都完成:


  1. 创建散列映射,创建密钥并填充散列映射
  2. 遍历整个散列映射,访问所有密钥。
  3. 执行一些并行插入和删除的数据混排。
  4. 重复步骤2和3几百次。

这个逻辑流程意味着40秒的迭代将重复几百次,比如说100。一个小时只是在访问节点。使用一组32个并行迭代器,它可以下降到只有几分钟,这是一个显着的性能改进。



现在几个字如何 ConcurrentHashMap 工作(或我怎么相信它的工作原理)。每个 ConcurrentHashMap 由段(默认为16)组成。每个对哈希映射的写入都在相关段上同步。所以说,我们试图写两个新的密钥k1和k2到散列图,它们将被解析为属于同一个段,说s1。如果尝试同时写入,则其中一个将首先获取锁定,并且先于另一个添加。两个元素被解析为属于同一段的机会是什么?如果我们有一个好的哈希函数和16个segements它是1/16。



我相信 ConcurrentHashMap 应该有一个方法 concurrentKeys(),它将返回一个枚举数组,每个段一个。我有几个想法如何添加到 ConcurrentHashMap 通过继承,我会让你知道,如果我成功。至于现在的解决方案似乎是创建一个ConcurrentHashMaps数组,并预先哈希每个键,以解析为这样的数组的一个成员。


$ b

>这是使用其他语言的相同问题:



并行迭代器

解决方案

我最终会找到的解决方案是一个 ConcurrentHashMaps 而不是一个 ConcurrentHashMap 。这是临时的,但似乎与我的usecase相关。我不在乎第二步是慢,因为它不会影响我的代码的性能。解决方案是:



对象创建:


  1. 创建一个大小为t的ConcurrentHashMaps,其中t是线程数。
  2. 创建一个大小为t的Runnables数组。

Array Population(单线程,不是问题):


  1. 创建键并应用预散列函数,它将返回范围为0 ... t-1的int。在我的情况下简单模t。
  2. 通过访问数组中的相应条目,将键放在hashmap中。例如。如果预散列已经导致索引4,那么去hashArray [4] .put(key)

Array迭代(很好的多线程,性能增益):


  1. 从Runnables数组中分配每个线程一个迭代hashmap的作业,使用相应的索引。这应该给出一个t倍更短的迭代,而不是单线程。

要查看概念证明代码(因为它有一些项目的依赖项,我不能在这里发布它)朝向
< a href =https://github.com/gutechsoc-hackathon/multithreadedSAS/blob/permanentLinkStackOverflowQuestion19860217/Coppersmith2005.java =nofollow>我在github上的项目



EDIT



实际上,对我的系统实现上述概念验证证明是耗时,和非常失望。此外,我发现我会错过标准库ConcurrentHashMap的许多功能。我最近一直在探索的解决方案,看起来不那么特别,更有前途的是使用Scala,它生成的字节码与Java完全互操作。概念证明依赖于在本文和AFAIK中描述的令人惊叹的库目前无法在没有写入数千行代码的情况下在香草Java中实现相应的解决方案,给定标准库和相应的第三方库的当前状态。

  import scala.collection.parallel.mutable.ParHashMap 

class Node(value:Int,id:Int){
var v = value
var i = id
override def toString():String = v toString
}

对象testParHashMap {
def visit(entry:Tuple2 [Int,Node]){
entry._2.v + = 1
}
def main(args:Array [String]){
val hm = new ParHashMap [Int,Node] $ b for(i < - 1到10){
var node = new Node(0,i)
hm.put(node.i,node)
}

println(========== BEFORE ==========)
hm.foreach {println}

hm。 foreach {visit}

println(========== AFTER ==========)
hm.foreach {println}

}
}


I have 32 machine threads and one ConcurrentHashMap<Key,Value> map, which contains a lot of keys. Key has defined a public method visit(). I want to visit() every element of map exactly once using the processing power I have available and possibly some sort of thread pooling.

Things I could try:

  • I could use the method map.keys(). The resulting Enumeration<Key> could be iterated over using nextElement(), but since a call to key.visit() is very brief I won't manage to keep threads busy. The Enumeration is inherently single-threaded.
  • I could use a synchronised HashSet<Key> instead, invoke a method toArray() and split the work on the array into all 32 threads. I seriously doubt in this solution, since the method toArray() will likely be a single-thread bottle-neck.
  • I could try to inherit from ConcurrentHashMap, get my hands on the instances of its inner Segment<K,V>, try to group them into 32 groups and work on each group separately. This sounds like a hardcore approach though.
  • or similar magic with Enumeration<Key>.

Ideally:

  • Ideally a ConcurrentHashMap<Key, Value> would define a method keysEnumerator(int approximatePosition), which could drop me an enumerator missing approximately first 1/32 elements, i.e. map.keysEnumerator(map.size()/32)

Am I missing anything obvious? Has anybody run into similar problem before?

EDIT

I've had a go at profiling to see whether this problem is actually going to affect the performance in practice. As I don't have access to the cluster at the moment I used my laptop and tried to extrapolate the results to a bigger dataset. On my machine I can create a 2 million keys ConcurrentHashMap and it takes about 1 second to iterate over it invoking the visit() method on every key. The program is supposed to scale to 85 million keys (and over). The cluster's processor is slightly faster, but it still should take about 40 seconds to iterate over entire map. Now a few words about the logic flow of the program. The logic presented is sequential, i.e. it is not allowed for any thread to proceed to the next step until all the threads in the previous step are finished:

  1. Create the hash map, create the keys and populate the hash map
  2. Iterate over entire hash map visiting all the keys.
  3. Do some data shuffling which is parallel insertions and deletions.
  4. Repeat step 2 and 3 a few hundred times.

That logic flow means that a 40 second iteration is going to be repeated a few hundred times, say 100. Which gives us a bit over an hour spent just in visiting the nodes. With a set of 32 parallel iterators it could go down to just a few minutes, which is a significant performance improvement.

Now a few words on how ConcurrentHashMap works (Or how I believe it works). Every ConcurrentHashMap consists of segments (by default 16). Every write to a hash map is synchronised on a relevant segment. So say we're trying to write two new keys k1 and k2 to the hash map and that they would be resolved to belong to the same segment, say s1. If they are attempted to be written simultaneously, one of them is going to acquire the lock first and be added earlier then the other. What is the chance of two elements to be resolved to belong to the same segment? In case we have got a good hash function and 16 segements it is 1/16.

I belive that ConcurrentHashMap should have a method concurrentKeys(), which would return an array of Enumerations, one per each segment. I have got a few ideas how to add it to ConcurrentHashMap through inheritance and I'll let you know if I succeed. As for now the solution seems to be to create an array of ConcurrentHashMaps and pre-hashing every key to resolve to one member of such array. I'll share that code as well, once it's ready.

EDIT

This is the same problem in a different language:

Parallel Iterators

解决方案

The solution I will eventually go for is an array of ConcurrentHashMaps instead of one ConcurrentHashMap. This is ad hoc, but seems to be relevant for my usecase. I don't care about the second step being slow as it doesn't affect my code's performance. The solution is:

Object Creation:

  1. Create an array of size t of ConcurrentHashMaps, where t is a number of threads.
  2. Create an array of Runnables, also of size t.

Array Population (single threaded, not an issue):

  1. Create the keys and apply pre-hash function, which will return an int in the range 0 ... t-1. In my case simply modulo t.
  2. Put the key in the hashmap, by accessing appropriate entry in the array. E.g. if the pre-hash has resulted in index 4, then go for hashArray[4].put(key)

Array Iteration (nicely multithreaded, performance gain):

  1. Assign every thread from Runnables array a job of iterating over the hashmap with a corresponding index. This should give give a t times shorter iteration as opposed to single threaded.

To see the proof of concept code (as it's got some dependencies from the project I can't post it here) head towards my project on github

EDIT

Actually, implementing the above proof of concept for my system has proven to be time-consuming, bug-prone and grossly disappointing. Additionally I've discovered I would have missed many features of the standard library ConcurrentHashMap. The solution I have been exploring recently, which looks much less ad-hoc and much more promising is to use Scala, which produces bytecode that is fully interoperable with Java. The proof of concept relies on stunning library described in this paper and AFAIK it is currently IMPOSSIBLE to achieve a corresponding solution in vanilla Java without writing thousands lines of code, given the current state of the standard library and corresponding third-party libraries.

import scala.collection.parallel.mutable.ParHashMap

class Node(value: Int, id: Int){
    var v = value
    var i = id
    override def toString(): String = v toString
}

object testParHashMap{
    def visit(entry: Tuple2[Int, Node]){
        entry._2.v += 1
    }
    def main(args: Array[String]){
        val hm = new ParHashMap[Int, Node]()
        for (i <- 1 to 10){
            var node = new Node(0, i)
            hm.put(node.i, node)
        }

        println("========== BEFORE ==========")
        hm.foreach{println}

        hm.foreach{visit}

        println("========== AFTER ==========")
        hm.foreach{println}

    }
}

这篇关于可扩展方式访问ConcurrentHashMap的每个元素&lt; Element,Boolean&gt;正好一次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆