HashMap在多线程多核系统中的可伸缩性问题 [英] Scalability issue with HashMap in Multithreaded Multicore system

查看:105
本文介绍了HashMap在多线程多核系统中的可伸缩性问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从哈希图读取数据时,我遇到了可伸缩性问题.我的机器有32个内核,每个内核有2个超线程(因此总计64 cpus)和64 GB RAM. 从HashMap读取数据并进行算术计算时,我发现性能从16个线程开始下降,但是仅执行算术运算时,它会按预期缩放.

I am facing scalability issue while reading data from hashmap. My machine has 32 core with 2 hyper thread per core (so total 64 cpus) and 64 GB RAM. When reading data from HashMap and doing arithmetic calculation I am seeing a performance dip from 16 thread onwards, but while doing only arithmetic operation it's scaling as per expectation.

请在下面找到测试结果:

Please find the test result below:

从HashMap读取并执行算术运算:

Reading from HashMap and performing arithematic operation:

没有线程|时间(秒)=> 1 | 85, 2 | 93, 4 | 124, 8 | 147, 16 | 644

no of threads | Time Taken (seconds) => 1 | 85, 2 | 93, 4 | 124, 8 | 147, 16 | 644

仅执行算术运算:

没有线程|时间(秒)=> 1 | 25岁 2 | 32, 4 | 35岁 8 | 41, 16 | 65岁 32 | 108, 40 | 112, 64 | 117, 100 | 158

no of threads | Time Taken (seconds) => 1 | 25, 2 | 32, 4 | 35, 8 | 41, 16 | 65, 32 | 108, 40 | 112, 64 | 117, 100 | 158

还要添加代码块以供参考:

Also adding the code block for reference :

import java.util.*;

import java.util.concurrent.*;

import java.lang.*;

public class StringCallable2
{

//  private static final long   size    = 500000L;
    private static final long   size    = 1000000L;
//  private final static HashMap <Long,Long>map = new HashMap<Long, Long>();

//  private static long[] array = new long[(int) size];
    public static class StringGenCallable implements Callable
    {
        int count;
        public StringGenCallable(int count)
        {
            this.count = count;
        }

        public Long call()
        {

            //Random rand = new Random();
//          System.out.println("Thread " + count + " started test");
            long sum = 20;
            // do a CPU intensive arithmetic operation; no Input Output
            // operations, object creations or floating point arithmetic

            for (long i = 0; i < size; i++)
            {
                //int numNoRange = rand.nextInt((int)(size-1));
                //long numNoRange = i;
                // Long long1 = map.get((long)i);
                //Long long1 = array[(int)i];
                sum = i + 19 * sum;
            }
//          System.out.println("Finished " + count);

            return sum;
        }
    }

    public static void main(String args[]) 
    {
        try
        {
        System.out.println("Starting");
        // for (long i = 0; i < size; i++)
        // {
            //array[(int)i] = System.currentTimeMillis();
        //  map.put(i, System.currentTimeMillis());
        // }
        int sizt = Integer.valueOf(args[0]);
        long curtime = System.currentTimeMillis();
        ExecutorService pool = Executors.newFixedThreadPool(sizt);
        Set<Future<Integer>> set = new HashSet<Future<Integer>>();
        for (int i = 0; i < sizt; i++)
        {
            Callable<Integer> callable = new StringGenCallable(i);
            Future<Integer> future = pool.submit(callable);
            set.add(future);
        }

        long sum = 0;
        for (Future<Integer> future : set)
        {
            future.get();
        }

        System.out.println("Number of threads : "+sizt);
        long finsihtime = System.currentTimeMillis();
        System.out.println("Total Time Taken : " + (finsihtime - curtime)+" ms");
        pool.shutdown();
        // System.exit(sum);
        }
        catch (Exception e) {
            // TODO: handle exception
            e.printStackTrace();
        }
        catch (Error e) {
            // TODO: handle exception
            e.printStackTrace();
        }
        catch (Throwable e) {
            // TODO: handle exception
            e.printStackTrace();
        }
    }

}

推荐答案

对于具有这种多层处理能力的应用程序,您应该使用

For an application with this level of multiprocessing you should be using ConcurrentHashMap. I would redesign to incorporate that change, and then revisit the performance.

我还要仔细考虑可以有效使用多少个线程.将添加更多线程"视为性能的灵丹妙药很容易,事实并非如此.通过限制线程数并使当前共享的数据结构成为 ThreadLocal ,以减少数据共享以及由此产生的争用和上下文切换.

I would also think carefully about how many threads you can effectively use. It's easy to view 'add more threads' as a performance panacea, and it's not. You may get more improvement by limiting the thread count and making currently-shared data structures into ThreadLocal, to reduce data sharing and the resulting contention and context switching.

在此示例中,即使假设您拥有该进程的整个组件,但由于工作项纯粹是受CPU限制的,拥有64个以上的线程会使该进程的运行速度越来越慢.

In this example, even assuming you own the entire box for this process, having > 64 threads will make the process run increasingly slower, since the work-items are purely CPU-bound.

在现实世界中的应用程序中,工作单元可能会比您这里拥有的要复杂得多或需要更长的时间.对于从硬件(每个线程的工作量非常小)所得出的结论要谨慎一些.关键是,相对于更复杂的工作负载,这里的线程管理开销相对于已执行的工作被放大了.在更复杂的工作负载中,在HashMap中查找的可见效果可能趋于消失,并且性能看起来更像您期望的那样.

In a real world application, the unit of work would likely be a lot more complicated or long-running than what you have here. Be cautious about drawing too many conclusions from what is for your hardware a fairly trivial per-thread unit of work. The point is that relative to more complex workload, the thread management overhead here is amplified versus the executed work. In a more complex workload, the visible effect of lookup in the HashMap may tend to disappear and performance look more like what you would expect.

这篇关于HashMap在多线程多核系统中的可伸缩性问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆