ExecutorService令人惊讶的性能收支平衡点---经验法则? [英] ExecutorService's surprising performance break-even point --- rules of thumb?

查看:115
本文介绍了ExecutorService令人惊讶的性能收支平衡点---经验法则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试图弄清楚如何正确使用Java的Executors。我意识到将任务提交到 ExecutorService 有自己的开销。但是,我很惊讶地发现它已经高了。

I'm trying to figure out how to correctly use Java's Executors. I realize submitting tasks to an ExecutorService has its own overhead. However, I'm surprised to see it is as high as it is.

我的程序需要以低延迟处理大量数据(股票市场数据)可能。大多数计算都是相当简单的算术运算。

My program needs to process huge amount of data (stock market data) with as low latency as possible. Most of the calculations are fairly simple arithmetic operations.

我试图测试一些非常简单的东西: Math.random()* Math.random( )

I tried to test something very simple: "Math.random() * Math.random()"

最简单的测试在一个简单的循环中运行这个计算。第二个测试在匿名Runnable中进行相同的计算(这应该衡量创建新对象的成本)。第三个测试将 Runnable 传递给 ExecutorService (这测量了引入执行程序的成本)。

The simplest test runs this computation in a simple loop. The second test does the same computation inside a anonymous Runnable (this is supposed to measure the cost of creating new objects). The third test passes the Runnable to an ExecutorService (this measures the cost of introducing executors).

我在我的小型笔记本电脑上运行测试(2 cpus,1.5 gig ram):

I ran the tests on my dinky laptop (2 cpus, 1.5 gig ram):

(in milliseconds)
simpleCompuation:47
computationWithObjCreation:62
computationWithObjCreationAndExecutors:422

(大约四次运行中,前两个数字最终相等)

(about once out of four runs, the first two numbers end up being equal)

请注意,执行者花费的时间远远超过执行单线程。对于1到8之间的线程池大小,数字大致相同。

Notice that executors take far, far more time than executing on a single thread. The numbers were about the same for thread pool sizes between 1 and 8.

问题:我是否遗漏了一些明显或预期的结果?这些结果告诉我,我传递给执行程序的任何任务都必须进行一些非平凡的计算。如果我正在处理数百万条消息,并且我需要对每条消息执行非常简单(且便宜)的转换,我仍然可能无法使用执行程序...尝试在多个CPU之间传播计算可能最终会比仅仅更昂贵在一个线程中完成它们。设计决策变得比我原先想象的要复杂得多。有什么想法吗?

Question: Am I missing something obvious or are these results expected? These results tell me that any task I pass in to an executor must do some non-trivial computation. If I am processing millions of messages, and I need to perform very simple (and cheap) transformations on each message, I still may not be able to use executors...trying to spread computations across multiple CPUs might end up being costlier than just doing them in a single thread. The design decision becomes much more complex than I had originally thought. Any thoughts?

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class ExecServicePerformance {

 private static int count = 100000;

 public static void main(String[] args) throws InterruptedException {

  //warmup
  simpleCompuation();
  computationWithObjCreation();
  computationWithObjCreationAndExecutors();

  long start = System.currentTimeMillis();
  simpleCompuation();
  long stop = System.currentTimeMillis();
  System.out.println("simpleCompuation:"+(stop-start));

  start = System.currentTimeMillis();
  computationWithObjCreation();
  stop = System.currentTimeMillis();
  System.out.println("computationWithObjCreation:"+(stop-start));

  start = System.currentTimeMillis();
  computationWithObjCreationAndExecutors();
  stop = System.currentTimeMillis();
  System.out.println("computationWithObjCreationAndExecutors:"+(stop-start));


 }

 private static void computationWithObjCreation() {
  for(int i=0;i<count;i++){
   new Runnable(){

    @Override
    public void run() {
     double x = Math.random()*Math.random();
    }

   }.run();
  }

 }

 private static void simpleCompuation() {
  for(int i=0;i<count;i++){
   double x = Math.random()*Math.random();
  }

 }

 private static void computationWithObjCreationAndExecutors()
   throws InterruptedException {

  ExecutorService es = Executors.newFixedThreadPool(1);
  for(int i=0;i<count;i++){
   es.submit(new Runnable() {
    @Override
    public void run() {
     double x = Math.random()*Math.random();     
    }
   });
  }
  es.shutdown();
  es.awaitTermination(10, TimeUnit.SECONDS);
 }
}


推荐答案


  1. 使用执行程序是关于利用CPU和/或CPU内核,因此如果创建一个充分利用CPU数量的线程池,则必须拥有与CPU /内核一样多的线程。

  2. 你是对的,创建新对象的成本太高了。因此减少开支的一种方法是使用批次。如果您知道要执行的计算的种类和数量,则可以创建批次。因此,考虑在一个执行的任务中完成的千次计算。您为每个线程创建批次。计算完成后(java.util.concurrent.Future),您将创建下一批。甚至可以在parralel中创建新批次(4个CPU - > 3个线程用于计算,1个线程用于批量配置)。最后,您可能会获得更高的吞吐量,但内存需求更高(批量,配置)。

编辑:我更改了你的例子,我让它在我的小型双核x200笔记本电脑上运行。

I changed your example and I let it run on my little dual-core x200 laptop.

provisioned 2 batches to be executed
simpleCompuation:14
computationWithObjCreation:17
computationWithObjCreationAndExecutors:9

如你所见在源代码中,我也将批量配置和执行器生命周期从测量中取出。与其他两种方法相比,这更公平。

As you see in the source code, I took the batch provisioning and executor lifecycle out of the measurement, too. That's more fair compared to the other two methods.

自己查看结果......

See the results by yourself...

import java.util.List;
import java.util.Vector;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class ExecServicePerformance {

    private static int count = 100000;

    public static void main( String[] args ) throws InterruptedException {

        final int cpus = Runtime.getRuntime().availableProcessors();

        final ExecutorService es = Executors.newFixedThreadPool( cpus );

        final Vector< Batch > batches = new Vector< Batch >( cpus );

        final int batchComputations = count / cpus;

        for ( int i = 0; i < cpus; i++ ) {
            batches.add( new Batch( batchComputations ) );
        }

        System.out.println( "provisioned " + cpus + " batches to be executed" );

        // warmup
        simpleCompuation();
        computationWithObjCreation();
        computationWithObjCreationAndExecutors( es, batches );

        long start = System.currentTimeMillis();
        simpleCompuation();
        long stop = System.currentTimeMillis();
        System.out.println( "simpleCompuation:" + ( stop - start ) );

        start = System.currentTimeMillis();
        computationWithObjCreation();
        stop = System.currentTimeMillis();
        System.out.println( "computationWithObjCreation:" + ( stop - start ) );

        // Executor

        start = System.currentTimeMillis();
        computationWithObjCreationAndExecutors( es, batches );    
        es.shutdown();
        es.awaitTermination( 10, TimeUnit.SECONDS );
        // Note: Executor#shutdown() and Executor#awaitTermination() requires
        // some extra time. But the result should still be clear.
        stop = System.currentTimeMillis();
        System.out.println( "computationWithObjCreationAndExecutors:"
                + ( stop - start ) );
    }

    private static void computationWithObjCreation() {

        for ( int i = 0; i < count; i++ ) {
            new Runnable() {

                @Override
                public void run() {

                    double x = Math.random() * Math.random();
                }

            }.run();
        }

    }

    private static void simpleCompuation() {

        for ( int i = 0; i < count; i++ ) {
            double x = Math.random() * Math.random();
        }

    }

    private static void computationWithObjCreationAndExecutors(
            ExecutorService es, List< Batch > batches )
            throws InterruptedException {

        for ( Batch batch : batches ) {
            es.submit( batch );
        }

    }

    private static class Batch implements Runnable {

        private final int computations;

        public Batch( final int computations ) {

            this.computations = computations;
        }

        @Override
        public void run() {

            int countdown = computations;
            while ( countdown-- > -1 ) {
                double x = Math.random() * Math.random();
            }
        }
    }
}

这篇关于ExecutorService令人惊讶的性能收支平衡点---经验法则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆