newFixedThreadPool vs newSingleThreadExecutor的性能问题 [英] Performance Issues with newFixedThreadPool vs newSingleThreadExecutor

查看:1169
本文介绍了newFixedThreadPool vs newSingleThreadExecutor的性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试对我们的客户端代码进行基准测试。所以我决定写一个多线程程序来做我的客户端代码的基准测试。我试图测量以下方法将花费多少时间(95百分位数) -



attributes = deClient.getDEAttributes(columnsList);



下面是我写的基于上述方法的多线程代码。我看到很多变化在我的两个场景 -



1)首先,多线程代码通过使用 20线程运行15分钟。我得到95百分位为 37ms 。我使用 -

  ExecutorService service = Executors.newFixedThreadPool(20); 

2)但是如果我正在运行 15分钟使用 -



ExecutorService service = Executors.newSingleThreadExecutor();



而不是



ExecutorService service = Executors.newFixedThreadPool(20); / p>

我得到95%作为 7ms 这是方式小于上面的数字,当我运行我的代码 newFixedThreadPool(20)



任何人都可以告诉我什么是这样的高性能问题的原因 -



newSingleThreadExecutor vs newFixedThreadPool(20)



我通过这两种方式为 15分钟运行我的程序。



下面是我的代码 - / p>

  public static void main(String [] args){

try {

//创建给定大小的线程池
// ExecutorService service = Executors.newFixedThreadPool(20);
ExecutorService service = Executors.newSingleThreadExecutor();

long startTime = System.currentTimeMillis();
long endTime = startTime +(15 * 60 * 1000); //运行15分钟

for(int i = 0; i< threads; i ++){
service.submit(new ServiceTask(endTime,serviceList));
}

//等待终止
service.shutdown();
service.awaitTermination(Long.MAX_VALUE,TimeUnit.DAYS);
} catch(InterruptedException e){

} catch(Exception e){

}
}



下面是实现Runnable接口的类 -

  class ServiceTask implements Runnable {

private static final Logger LOG = Logger.getLogger(ServiceTask.class.getName());
private static随机随机= new SecureRandom();

public static volatile AtomicInteger countSize = new AtomicInteger();

private final long endTime;
private final LinkedHashMap< String,ServiceInfo> tableLists;

public static ConcurrentHashMap< Long,Long> selectHistogram = new ConcurrentHashMap< Long,Long>();


public ServiceTask(long endTime,LinkedHashMap< String,ServiceInfo> tableList){
this.endTime = endTime;
this.tableLists = tableList;
}

@Override
public void run(){

try {

while(System.currentTimeMillis < = endTime){

double randomNumber = random.nextDouble()* 100.0;

ServiceInfo service = selectRandomService(randomNumber);

final String id = generateRandomId(random);
final List< String> columnsList = getColumns(service.getColumns());

List< DEAttribute<?>> attributes = null;

DEKey bk = new DEKey(service.getKeys(),id);
List< DEKey> list = new ArrayList< DEKey>();
list.add(bk);

客户端deClient =新客户端(列表);

final long start = System.nanoTime();

attributes = deClient.getDEAttributes(columnsList);

final long end = System.nanoTime() - start;
final long key = end / 1000000L;
boolean done = false;
while(!done){
Long oldValue = selectHistogram.putIfAbsent(key,1L);
if(oldValue!= null){
done = selectHistogram.replace(key,oldValue,oldValue + 1);
} else {
done = true;
}
}
countSize.getAndAdd(attributes.size());

handleDEAttribute(attributes);

if(BEServiceLnP.sleepTime> 0L){
Thread.sleep(BEServiceLnP.sleepTime);
}
}
} catch(Exception e){

}
}
}

更新: -



我的机器配置规格从Linux机器运行我的程序 -

 处理器:0 
vendor_id:GenuineIntel
cpu family:6
型号:45
型号名称:Intel(R)Xeon(R)CPU E5-2670 0 @ 2.60GHz
步进:7
cpu MHz:2599.999
缓存大小:20480 KB
FPU:是
fpu_exception:是
CPUID级别:13
可湿性粉剂:是
标志:FPU VME德PSE TSC MSR,PAE MCE CX8 APIC九月MTRR PGE MCA CMOV拍拍PSE36 CLFLUSH DTS ACPI MMX FXSR SSE SSE2 SS系统调用NX rdtscp流明constant_tsc arch_perfmon PEBS BTS rep_good xtopology tsc_reliable nonstop_tsc aperfmperf PNI pclmulqdq SSSE3 CX16 sse4_1 sse4_2 POPCNT AES管理程序lahf_lm ARAT PLN点
bogomips:5199.99
clflush大小:64
cache_alignment:64
地址大小:40位物理,48位虚拟
电源管理:

处理器:1
vendor_id: GenuineIntel
CPU系列:6
型:45
型号名称:英特尔(R)至强(R)CPU E5-2670 0 @ 2.60GHz
步进:7
CPU兆赫:2599.999
缓存大小:20480 KB
FPU:是
fpu_exception:是
CPUID级别:13
可湿性粉剂:是
标志:FPU VME德PSE TSC MSR,PAE MCE CX8 APIC月MTRR PGE MCA CMOV拍拍PSE36 CLFLUSH DTS ACPI MMX FXSR SSE SSE2 SS系统调用NX rdtscp流明constant_tsc arch_perfmon PEBS BTS rep_good xtopology tsc_reliable nonstop_tsc aperfmperf PNI pclmulqdq SSSE3 CX16 sse4_1 sse4_2 POPCNT AES管理程序lahf_lm ARAT PLN点
bogomips:5199.99
CLFLUSH尺寸:64
cache_alignment:64
地址尺寸:40位物理,48位虚拟
电源管理:


解决方案

如果你并行运行更多的任务(我怀疑你有20多个处理器盒),那么每个任务将需要更长的时间来完成。计算机更容易一次执行一个任务。即使您将池中的线程数限制为您拥有的CPU数,每个任务也可能运行较慢 - 虽然稍微。



但是,看看你的20线程与1相比的吞吐量,那么你应该看到20线程的速度应该大得多。这意味着如果你使用20个线程执行1000个任务,它们的整体速度将比只有1个线程快得多。每个任务可能需要更长时间,但没有任何并行执行。



通过降低池中的线程数以接近单线程速度,您应该能够最大化这种吞吐量。它在很大程度上取决于IO的数量,所使用的CPU周期,锁定,同步块和其他因素。


I am trying to Benchmark our Client code. So I decided I will write a multithreading program to do the benchmarking of my client code. I am trying to measure how much time (95 Percentile) below method will take-

attributes = deClient.getDEAttributes(columnsList);

So below is the multithreaded code I wrote to do the benchmarking on the above method. I am seeing lot of variations in my two scenarios-

1) Firstly, with multithreaded code by using 20 threads and running for 15 minutes. I get 95 percentile as 37ms. And I am using-

ExecutorService service = Executors.newFixedThreadPool(20);

2) But If I am running my same program for 15 minutes using-

ExecutorService service = Executors.newSingleThreadExecutor();

instead of

ExecutorService service = Executors.newFixedThreadPool(20);

I get 95 percentile as 7ms which is way less than the above number when I am running my code with newFixedThreadPool(20).

Can anyone tell me what can be the reason for such high performance issues with-

newSingleThreadExecutor vs newFixedThreadPool(20)

And by both ways I am running my program for 15 minutes.

Below is my code-

public static void main(String[] args) {

    try {

        // create thread pool with given size
        //ExecutorService service = Executors.newFixedThreadPool(20);
        ExecutorService service = Executors.newSingleThreadExecutor();

        long startTime = System.currentTimeMillis();
        long endTime = startTime + (15 * 60 * 1000);//Running for 15 minutes

        for (int i = 0; i < threads; i++) {
            service.submit(new ServiceTask(endTime, serviceList));
        }

        // wait for termination        
        service.shutdown();
        service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
    } catch (InterruptedException e) {

    } catch (Exception e) {

    }
}

Below is the class that implements Runnable interface-

class ServiceTask implements Runnable {

    private static final Logger LOG = Logger.getLogger(ServiceTask.class.getName());
    private static Random random = new SecureRandom();

    public static volatile AtomicInteger countSize = new AtomicInteger();

    private final long endTime;
    private final LinkedHashMap<String, ServiceInfo> tableLists;

    public static ConcurrentHashMap<Long, Long> selectHistogram = new ConcurrentHashMap<Long, Long>();


    public ServiceTask(long endTime, LinkedHashMap<String, ServiceInfo> tableList) {
        this.endTime = endTime;
        this.tableLists = tableList;
    }

    @Override
    public void run() {

        try {

            while (System.currentTimeMillis() <= endTime) {

                double randomNumber = random.nextDouble() * 100.0;

                ServiceInfo service = selectRandomService(randomNumber);

                final String id = generateRandomId(random);
                final List<String> columnsList = getColumns(service.getColumns());

                List<DEAttribute<?>> attributes = null;

                DEKey bk = new DEKey(service.getKeys(), id);
                List<DEKey> list = new ArrayList<DEKey>();
                list.add(bk);

                Client deClient = new Client(list);

                final long start = System.nanoTime();

                attributes = deClient.getDEAttributes(columnsList);

                final long end = System.nanoTime() - start;
                final long key = end / 1000000L;
                boolean done = false;
                while(!done) {
                    Long oldValue = selectHistogram.putIfAbsent(key, 1L);
                    if(oldValue != null) {
                        done = selectHistogram.replace(key, oldValue, oldValue + 1);
                    } else {
                        done = true;
                    }
                }
                countSize.getAndAdd(attributes.size());

                handleDEAttribute(attributes);

                if (BEServiceLnP.sleepTime > 0L) {
                    Thread.sleep(BEServiceLnP.sleepTime);
                }
            }
        } catch (Exception e) {

        }
    }
}

Updated:-

My machine config spec- I am running my program from Linux machine-

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
cpu MHz         : 2599.999
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat pln pts
bogomips        : 5199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
cpu MHz         : 2599.999
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat pln pts
bogomips        : 5199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

解决方案

If you are running many more tasks in parallel (20 in the case) than you have processors (I doubt that you have 20+ processor box) then each task is going to take longer to complete. It is easier for the computer to execute one task at a time instead. Even if you limit the number of threads in the pool to the number of CPUs you have, each task probably will run slower -- albeit slightly.

If, however, you look at the throughput that you get with your 20 threads versus the 1 then you should see that the 20 thread speed should be much larger. This should mean that if you execute 1000 tasks with 20 threads, they overall will finish much faster than with just 1 thread. Each task may take longer but without any of them executing in parallel.

By lowering the number of threads in your pool to get closer to the single thread speed, you should be able to maximize this throughput. It depends heavily on the amount of IO, the CPU cycles used, locks, synchronized blocks, and other factors.

这篇关于newFixedThreadPool vs newSingleThreadExecutor的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆