JVM是否有能力检测并行化的机会? [英] Does the JVM have the ability to detect opportunities for parallelization?

查看:94
本文介绍了JVM是否有能力检测并行化的机会?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java Hotspot可以很好地优化顺序代码。但我猜测,随着多核计算机的出现,运行时的信息可以用于检测在运行时并行化代码的机会,例如检测软件流水线是否可能在循环和类似的事情中。

The Java Hotspot can optimize the sequential code very well. But I was guessing that with the advent of multi-core computers, can the information at runtime be useful to detect opportunities to parallelize the code at runtime, for example detect the software pipelining is possible in a loop and similar things.

这个主题有没有做过有趣的工作?或者这是一个研究失败还是一个很难解决的停顿问题?

Was any interesting work ever been done on this topic ? Or is it a research failure or some halting problem which is very hard to solve?

推荐答案

我认为目前的保证< a href =http://en.wikipedia.org/wiki/Java_Memory_Model =noreferrer> Java内存模型使得很难在编译器或VM级别上做很多(如果有的话)自动并行化。 Java语言没有语义来保证任何数据结构甚至是有效的不可变的,或者任何特定的语句都是纯粹的并且没有副作用,因此编译器必须自动计算这些数据结构才能进行并行化。在编译器中可以推断出一些基本的机会,但是一般情况将留给运行时,因为动态加载和绑定可能会引入在编译时不存在的新突变。

I think the current guarantees of the Java memory model make it quite hard to do much, if any, automatic parallelization at the compiler or VM level. The Java language has no semantics to guarantee that any data structure is even effectively immutable, or that any particular statement is pure and free of side-effects, so the compiler would have to figure these out automatically in order to parallelize. Some elementary opportunities would be possible to infer in the compiler, but the general case would be left to the runtime, since dynamic loading and binding could introduce new mutations that didn't exist at compile-time.

考虑以下代码:

for (int i = 0; i < array.length; i++) {
    array[i] = expensiveComputation(array[i]);
}

如果 expensiveComputation <并行化将是微不足道的/ code>是纯函数,其输出仅取决于其参数,如果我们可以保证在循环期间不会更改数组(实际上我们正在更改它,设置 array [i] = ... ,但在这种特殊情况下 expensiveComputation(array [i])总是先调用,所以这里没关系 - 假设数组是本地的,不会从其他任何地方引用。)

It would be trivial to parallelize, if expensiveComputation is a pure function, whose output depends only on its argument, and if we could guarantee that array wouldn't be changed during the loop (actually we're changing it, setting array[i]=..., but in this particular case expensiveComputation(array[i]) is always called first so it's okay here - assuming that array is local and not referenced from anywhere else).

此外,如果我们改变这样的循环:

Furthermore, if we change the loop like this:

for (int i = 0; i < array.length; i++) {
    array[i] = expensiveComputation(array, i);
    // expensiveComputation has the whole array at its disposal!
    // It could read or write values anywhere in it!
}

然后并行化不再是微不足道的,即使 expensiveComputation 是纯粹的并且不会改变它的参数,因为并行线程更改数组的内容,而其他正在读它!并行化程序必须弄清楚数组 expensiveComputation 的哪些部分在各种条件下引用,并相应地进行同步。

then parallelization is not trivial any more even if expensiveComputation is pure and doesn't alter its argument, because the parallel threads would be changing the contents of array while others are reading it! The parallelizer would have to figure out which parts of the array expensiveComputation is referring to under various conditions, and synchronize accordingly.

也许它不会完全不可能检测可能发生的所有突变和副作用并在并行化时考虑这些因素,但它会<非常很难,当然,在实践中可能是不可行的。这就是为什么并行化,并确定一切仍然正常,是程序员在Java中头痛的原因。

Perhaps it wouldn't be outright impossible to detect all mutations and side-effects that may be going on and take those into account when parallelizing, but it would be very hard, for sure, probably infeasible in practice. This is why parallelization, and figuring out that everything still works correctly, is the programmer's headache in Java.

函数式语言(例如JVM上的Clojure)是一个热门的答案话题。纯粹,无副作用的功能以及持久性(有效不可变)数据结构可能允许隐式或几乎隐式的并行化。让我们将数组的每个元素加倍:

Functional languages (e.g. Clojure on JVM) are a hot answer to this topic. Pure, side-effect-free functions together with persistent ("effectively immutable") data structures potentially allow implicit or almost implicit parallelization. Let's double each element of an array:

(map #(* 2 %) [1 2 3 4 5])
(pmap #(* 2 %) [1 2 3 4 5])  ; The same thing, done in parallel.

这是透明的,因为有两件事:

This is transparent because of 2 things:


  1. 函数#(* 2%)是纯粹的:它取值并输出一个值,就是这样。它不会改变任何东西,它的输出只取决于它的参数。

  2. 向量 [1 2 3 4 5] 是不可变:无论是谁在看它,或者什么时候,它都是一样的。

  1. The function #(* 2 %) is pure: it takes a value in and gives a value out, and that's it. It doesn't change anything, and its output depends only on its argument.
  2. The vector [1 2 3 4 5] is immutable: no matter who's looking at it, or when, it's the same.

可以在Java中创建纯函数,但是2 ),不变性,是这里的阿喀琉斯之踵。 在Java中没有不可变数组。要成为学究者, nothing 在Java中是不可变的,因为即使 final 字段也可以用反射改变了。因此,不能保证计算的输出(或输入!)不会因并行化而改变 - >因此自动并行化通常是不可行的。

It's possible to make pure functions in Java, but 2), immutability, is the Achilles' heel here. There are no immutable arrays in Java. To be pedant, nothing is immutable in Java because even final fields can be changed using reflection. Therefore no guarantees can be made that the output (or input!) of a computation wouldn't be changed by parallelization -> so automatic parallelization is generally infeasible.

哑巴由于不变性,加倍元素的例子扩展到任意复杂的处理:

The dumb "doubling elements" example extends to arbitrarily complex processing, thanks to immutability:

(defn expensivefunction [v x]
  (/ (reduce * v) x))


(let [v [1 2 3 4 5]]
  (map (partial expensivefunction v) v)) ; pmap would work equally well here!

这篇关于JVM是否有能力检测并行化的机会?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆