为什么线程显示出比协程更好的性能? [英] Why threads are showing better performance than coroutines?

查看:43
本文介绍了为什么线程显示出比协程更好的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了3个简单的程序来测试协程在线程上的性能优势.每个程序执行许多常见的简单计算.所有程序都彼此分开运行.除了执行时间,我还通过 Visual VM IDE插件测量了CPU使用率.

I have written 3 simple programs to test coroutines performance advantage over threads. Each program does a lot of common simple computations. All programs were run separately from each other. Besides execution time I measured CPU usage via Visual VM IDE plugin.

  1. 第一个程序使用 1000线程池进行所有计算.由于上下文频繁更改,因此这段代码显示了与其他代码相比最差的结果( 64326 ms ):

  1. First program does all computations using 1000-threaded pool. This piece of code shows the worst results (64326 ms) comparing to others because of frequent context changes:

val executor = Executors.newFixedThreadPool(1000)
time = generateSequence {
  measureTimeMillis {
    val comps = mutableListOf<Future<Int>>()
    for (i in 1..1_000_000) {
      comps += executor.submit<Int> { computation2(); 15 }
    }
    comps.map { it.get() }.sum()
  }
}.take(100).sum()
println("Completed in $time ms")
executor.shutdownNow()

  1. 第二个程序具有相同的逻辑,但是它不是使用 1000线程池,而是仅使用 n线程池(其中 n 等于机器核心的数量).它显示出更好的结果( 43939毫秒),并使用更少的线程,这也很好.

  1. Second program has the same logic but instead of 1000-threaded pool it uses only n-threaded pool (where n equals to amount of the machine's cores). It shows much better results (43939 ms) and uses less threads which is good too.

val executor2 = Executors.newFixedThreadPool(4)
  time = generateSequence {
  measureTimeMillis {
    val comps = mutableListOf<Future<Int>>()
    for (i in 1..1_000_000) {
      comps += executor2.submit<Int> { computation2(); 15 }
    }
    comps.map { it.get() }.sum()
  }
}.take(100).sum()
println("Completed in $time ms")
executor2.shutdownNow()

  1. 用协程编写的第三程序在结果中显示出很大的差异(从 41784 ms 81101 ms ).我很困惑,不太了解为什么它们如此不同以及协程有时比线程慢(考虑小的异步计算是协程的 forte ).这是代码:

  1. Third program is written with coroutines and shows a big variance in the results (from 41784 ms to 81101 ms). I am very confused and don't quite understand why they are so different and why coroutines sometimes slower than threads (considering small async calculations is a forte of coroutines). Here is the code:

time = generateSequence {
  runBlocking {
    measureTimeMillis {
      val comps = mutableListOf<Deferred<Int>>()
      for (i in 1..1_000_000) {
        comps += async { computation2(); 15 }
      }
      comps.map { it.await() }.sum()
    }
  }
}.take(100).sum()
println("Completed in $time ms")

实际上,我对这些协程及其在kotlin中的实现方式有很多了解,但是在实践中,我并不认为它们能按预期工作.我基准测试做错了吗?还是我在使用协程错误?

I actually read a lot about these coroutines and how they are implemented in kotlin, but in practice I don't see them working as intended. Am I doing my benchmarking wrong? Or maybe I'm using coroutines wrong?

推荐答案

设置问题的方式,不要指望从协程中获得任何好处.在所有情况下,您都将不可分割的计算块提交给执行者.您没有利用协程悬架的想法,您可以在其中编写实际上可能被切碎并分段执行(可能在不同线程上)的顺序代码.

The way you've set up your problem, you shouldn't expect any benefit from coroutines. In all cases you submit a non-divisible block of computation to an executor. You are not leveraging the idea of coroutine suspension, where you can write sequential code that actually gets chopped up and executed piecewise, possibly on different threads.

协程的大多数用例都围绕阻塞代码:避免出现这样的情况:您花了一个线程无所事事,只是等待响应.它们也可能用于交织CPU密集型任务,但这是一种更为特殊的情况.

Most use cases of coroutines revolve around blocking code: avoiding the scenario where you hog a thread to do nothing but wait for a response. They may also be used to interleave CPU-intensive tasks, but this is a more special-cased scenario.

我建议对涉及一连串阻止步骤的1,000,000个任务进行基准测试,例如罗马伊利扎罗夫的KotlinConf 2017演讲:

I would suggest benchmarking 1,000,000 tasks that involve several sequential blocking steps, like in Roman Elizarov's KotlinConf 2017 talk:

suspend fun postItem(item: Item) {
    val token = requestToken()
    val post = createPost(token, item)
    processPost(post)
}

其中所有 requestToken() createPost() processPost()都涉及网络调用.

where all of requestToken(), createPost() and processPost() involve network calls.

如果您有两种实现方式,一种具有悬浮乐趣,而另一种具有常规的阻止功能,例如:

If you have two implementations of this, one with suspend funs and another with regular blocking functions, for example:

fun requestToken() {
   Thread.sleep(1000)
   return "token"
}

vs.

suspend fun requestToken() {
    delay(1000)
    return "token"
}

您会发现您甚至无法设置为执行第一个版本的1,000,000个并发调用,并且如果将数字降低到没有 OutOfMemoryException:无法创建新的本机线程,协程的性能优势应该显而易见.

you'll find that you can't even set up to execute 1,000,000 concurrent invocations of the first version, and if you lower the number to what you can actually achieve without OutOfMemoryException: unable to create new native thread, the performance advantage of coroutines should be evident.

如果您想探索协程对于CPU绑定任务的可能优势,则需要一个用例,无论顺序执行还是并行执行都无关紧要.在上面的示例中,这被视为无关紧要的内部细节:在一个版本中,您运行1,000个并发任务,而在另一个版本中,您仅使用四个,因此几乎是顺序执行.

If you want to explore possible advantages of coroutines for CPU-bound tasks, you need a use case where it's not irrelevant whether you execute them sequentially or in parallel. In your examples above, this is treated as an irrelevant internal detail: in one version you run 1,000 concurrent tasks and in the other one you use just four, so it's almost sequential execution.

Hazelcast Jet 是这种用例的一个例子,因为计算任务是相互依赖的:一个人的输出就是另一个人的输入.在这种情况下,您不能只在其中的几个线程池上运行它们直到完成,您实际上必须对它们进行交织,以便缓冲的输出不会爆炸.如果您尝试设置带有或不带有协程的这种方案,您将再次发现您分配的线程数与任务数一样,或者使用的是可挂起的协程,则后一种方法是成功的.Hazelcast Jet在纯Java API中实现了协程的精神.它的方法将从协程编程模型中受益匪浅,但目前它是纯Java.

Hazelcast Jet is an example of such a use case because the computation tasks are co-dependent: one's output is another one's input. In this case you can't just run a few of them until completion, on a small thread pool, you actually have to interleave them so the buffered output doesn't explode. If you try to set up such a scenario with and without coroutines, you'll once again find that you're either allocating as many threads as there are tasks, or you are using suspendable coroutines, and the latter approach wins. Hazelcast Jet implements the spirit of coroutines in plain Java API. Its approach would hugely benefit from the coroutine programming model, but currently it's pure Java.

披露:这篇文章的作者属于Jet工程团队.

这篇关于为什么线程显示出比协程更好的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆