Scala 如何在这里使用我的所有内核? [英] How does Scala use all my cores here?

查看:37
本文介绍了Scala 如何在这里使用我的所有内核?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

object PrefixScan {
  sealed abstract class Tree[A]
  case class Leaf[A](a: A) extends Tree[A]
  case class Node[A](l: Tree[A], r: Tree[A]) extends Tree[A]

  sealed abstract class TreeRes[A] { val res : A }
  case class LeafRes[A](override val res: A) extends TreeRes[A]
  case class NodeRes[A](l : TreeRes[A], override val res: A, r: TreeRes[A]) extends TreeRes[A]

  def reduceRes[A](t: Tree[A], f:(A,A)=>A): TreeRes[A] = t match {
    case Leaf(v) => LeafRes(v)
    case Node(l, r) => {
      val (tL, tR) = (reduceRes(l, f), reduceRes(r, f))
      NodeRes(tL, f(tL.res, tR.res), tR)
    }
  }
}

我担心 reduceRes 函数.

它有效……计算结果很棒!

It works ... the result of the computation is great!

然而,我实现了另一个版本,reduceResPar,它在前几个分支使用 fork-join 来并行化计算.但它没有加快速度.

However I went and implemented another version, reduceResPar, that uses fork-join at the first few branches, to parallelize the computation. But it gave no speed up.

然后我回过头来发现..上面的版本,reduceRes,已经在我的机器上使用了所有 12 个内核!!它怎么能做到这一点?我以为只有 1 个核心!

Then I went back and realized .. the above version, reduceRes, is already using all 12 cores on my machine!! How can it do that? I thought it would just be 1 core!

此代码来自 Coursera 上的 Parallel Programming 课程,在第 2 周的最后一堂课中,我们正在学习并行前缀扫描操作.

This code is from the Parallel Programming course on Coursera In the last lecture of week 2, we are learning about parallel prefix scan operations.

推荐答案

它怎么能做到这一点?我以为只有 1 个核心!

How can it do that? I thought it would just be 1 core!

您看到所有内核都在使用的事实并不意味着您的代码执行是并行的.我们可以从实现中看出它是顺序的,但我们不知道我们的单个线程在每个周期被操作系统调度到哪个 CPU 上.

The fact that you see all your cores being used doesn't mean your code execution is parallel. We can see from the implementation it's sequential, but we don't know which CPU our single thread will get scheduled on by the OS on each cycle.

当您在线程内执行一个方法时,操作系统会根据它管理的优先级队列决定它将获得多少 CPU 时间片以及何时获得.

When you execute a method inside a thread, the OS decides how many CPU time slices it will get and when, according to a priority queue it manages.

要查看您的算法可能在不同的内核上运行,我们可以询问操作系统当前正在哪个逻辑内核上执行我们的线程.我为 Windows 准备了一个小实现,它有一个名为 GetCurrentProcessorNumber() 返回我们正在执行的处理器编号.我们将使用 JNA 作为示例:

To see that your algorithm may run on different cores we can ask the OS on which logical core it's currently executing our thread. I've prepared a small implementation for Windows, which has a native WinAPI method called GetCurrentProcessorNumber() which returns the processor number we're executing on. We'll use JNA for the example:

build.sbt:

"net.java.dev.jna" % "jna" % "4.4.0"

Java 实现:

import com.sun.jna.Library;
import com.sun.jna.Native;

public class ProcessorNumberNative {

    public interface CLibrary extends Library {
        CLibrary INSTANCE = (CLibrary)
                Native.loadLibrary("Kernel32.dll",
                        CLibrary.class);

        Integer GetCurrentProcessorNumber();
    }
}

现在让我们在递归的每个步骤上添加一个 println :

Now let's add a println on each of the steps in your recursion:

def reduceRes[A](t: Tree[A], f: (A, A) => A): TreeRes[A] = t match {
  case Leaf(v) =>
    println(s"Logical Processor Number: ${ProcessorNumberNative.CLibrary.INSTANCE.GetCurrentProcessorNumber()}")
    LeafRes(v)

  case Node(l, r) => 
    println(s"Logical Processor Number: ${ProcessorNumberNative.CLibrary.INSTANCE.GetCurrentProcessorNumber()}")
    val (tL, tR) = (reduceRes(l, f), reduceRes(r, f))
    NodeRes(tL, f(tL.res, tR.res), tR)
}

现在让我们创建一棵树并执行:

Now let's create a tree and execute:

def main(args: Array[String]): Unit = {

  val tree = Node(Leaf(1),
                Node(Leaf(2),
                     Node(Node(Leaf(24), Leaf(30)),
                          Node(Leaf(3), Node(Leaf(10), Leaf(52))))))

  reduceRes(tree, (a: Int, b: Int) => a + b)
}

并进行这两种不同的运行(我正在运行具有 4 个逻辑内核的计算机):

And give this two different runs (I'm running a computer with 4 logical cores):

首先:

Logical Processor Number: 1
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 3
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0

第二:

Logical Processor Number: 1
Logical Processor Number: 3
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3

在每次执行期间,您会看到正在执行的线程在 3 个不同的内核(0、1 和 3)上获得了执行切片,而我们仍在单线程环境中运行.这表明,尽管您的算法的计算肯定是顺序的,但这并不意味着您不会看到所有内核都在运行.

During each execution, you see that the executing thread got a slice of execution on 3 different cores, 0, 1 and 3, while we're still running in a single threaded environment. This goes to show that although the computation of your algorithm is definitely sequential, that doesn't mean you won't be seeing all your cores in play.

这篇关于Scala 如何在这里使用我的所有内核?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆