进行JNI呼叫的定量开销是多少? [英] What is the quantitative overhead of making a JNI call?

查看:96
本文介绍了进行JNI呼叫的定量开销是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

仅仅基于性能,大约有多少简单的java行是进行JNI调用的等效性能?

Based on performance alone, approximately how many "simple" lines of java is the equivalent performance hit of making a JNI call?

或试图表达问题以更具体的方式,如果一个简单的java操作,如

Or to try to express the question in a more concrete way, if a simple java operation such as

someIntVar1 = someIntVar2 + someIntVar3;

的CPU工作指数 1 ,进行JNI呼叫的开销的典型(球场)CPU工作指数是什么?

was given a "CPU work" index of 1, what would be the typical (ballpark) "CPU work" index of the overhead of making the JNI call?



这个问题忽略等待本机代码执行所花费的时间。在电话用语中,它严格来说是通话的旗帜下降部分,而不是通话费率。


This question ignores the time taken waiting for the native code to execute. In telephonic parlance, it is strictly about the "flag fall" part of the call, not the "call rate".



原因要问这个问题,就是要有一个经验法则,当你知道本地成本(来自直接测试)和给定操作的java成本时,知道何时打算尝试编写JNI调用。它可以帮助您快速避免编写JNI调用的麻烦,只是发现标注开销消耗了使用本机代码的任何好处。


The reason for asking this question is to have a "rule of thumb" to know when to bother attempting coding a JNI call when you know the native cost (from direct testing) and the java cost of a given operation. It could help you quickly avoid the hassle to coding the JNI call only to find that the callout overhead consumed any benefit of using native code.

有些人对CPU,RAM等的变化感到困惑。这几乎与问题无关 - 我要求行的相对成本的java代码。如果CPU和RAM很差,它们对java和JNI都很差,所以环境因素应该平衡。 JVM版本也属于无关类别。

Some folks are getting hung up on variations in CPU, RAM etc. These are all virtually irrelevant to the question - I'm asking for the relative cost to lines of java code. If CPU and RAM are poor, they are poor for both java and JNI so environmental considerations should balance out. The JVM version falls into the "irrelevant" category too.

这个问题并不是要求以纳秒为单位的绝对时间,而是一个球场工作努力 简单java代码行的单位。

This question isn't asking for an absolute timing in nanoseconds, but rather a ball park "work effort" in units of "lines of simple java code".

推荐答案

快速分析器测试结果:

Java类:

public class Main {
    private static native int zero();

    private static int testNative() {
        return Main.zero();
    }

    private static int test() {
        return 0;
    }

    public static void main(String[] args) {
        testNative();
        test();
    }

    static {
         System.loadLibrary("foo");
    }
}

C库:

#include <jni.h>
#include "Main.h"

JNIEXPORT int JNICALL 
Java_Main_zero(JNIEnv *env, jobject obj)
{
    return 0;
}

结果:



系统详细信息:

java version "1.7.0_09"
OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-1)
OpenJDK Server VM (build 23.2-b09, mixed mode)
Linux visor 3.2.0-4-686-pae #1 SMP Debian 3.2.32-1 i686 GNU/Linux






更新: x86 (32/64位)和的Caliper微基准测试ARMv6 如下:


Update: Caliper micro-benchmarks for x86 (32/64 bit) and ARMv6 are as follows:

Java类:

public class Main extends SimpleBenchmark {
    private static native int zero();
    private Random random;
    private int[] primes;

    public int timeJniCall(int reps) {
        int r = 0;
        for (int i = 0; i < reps; i++) r += Main.zero();
        return r;
    }

    public int timeAddIntOperation(int reps) {
        int p = primes[random.nextInt(1) + 54];   // >= 257
        for (int i = 0; i < reps; i++) p += i;
        return p;
    }

    public long timeAddLongOperation(int reps) {
        long p = primes[random.nextInt(3) + 54];  // >= 257
        long inc = primes[random.nextInt(3) + 4]; // >= 11
        for (int i = 0; i < reps; i++) p += inc;
        return p;
    }

    @Override
    protected void setUp() throws Exception {
        random = new Random();
        primes = getPrimes(1000);
    }

    public static void main(String[] args) {
        Runner.main(Main.class, args);        
    }

    public static int[] getPrimes(int limit) {
        // returns array of primes under $limit, off-topic here
    }

    static {
        System.loadLibrary("foo");
    }
}

结果(x86 / i7500 / Hotspot / Linux):

Scenario{benchmark=JniCall} 11.34 ns; σ=0.02 ns @ 3 trials
Scenario{benchmark=AddIntOperation} 0.47 ns; σ=0.02 ns @ 10 trials
Scenario{benchmark=AddLongOperation} 0.92 ns; σ=0.02 ns @ 10 trials

       benchmark     ns linear runtime
         JniCall 11.335 ==============================
 AddIntOperation  0.466 =
AddLongOperation  0.921 ==

结果(amd64 / phenom 960T / Hostspot / Linux):

Scenario{benchmark=JniCall} 6.66 ns; σ=0.22 ns @ 10 trials
Scenario{benchmark=AddIntOperation} 0.29 ns; σ=0.00 ns @ 3 trials
Scenario{benchmark=AddLongOperation} 0.26 ns; σ=0.00 ns @ 3 trials

   benchmark    ns linear runtime
         JniCall 6.657 ==============================
 AddIntOperation 0.291 =
AddLongOperation 0.259 =

结果(armv6 / BCM2708 / Zero / Linux):

Scenario{benchmark=JniCall} 678.59 ns; σ=1.44 ns @ 3 trials
Scenario{benchmark=AddIntOperation} 183.46 ns; σ=0.54 ns @ 3 trials
Scenario{benchmark=AddLongOperation} 199.36 ns; σ=0.65 ns @ 3 trials

   benchmark  ns linear runtime
         JniCall 679 ==============================
 AddIntOperation 183 ========
AddLongOperation 199 ========






总结一下,似乎 JNI 调用大致相当于10-25个java操作在典型( x86 )硬件和 Hotspot VM 上。毫不奇怪,在优化程度较低的 Zero VM 下,结果完全不同(3-4个操作)。


To summarize things a bit, it seems that JNI call is roughly equivalent to 10-25 java ops on typical (x86) hardware and Hotspot VM. At no surprise, under much less optimized Zero VM, the results are quite different (3-4 ops).

感谢@ Giovanni Azua 和@ Marko Topolnik 参与和提示。

Thanks go to @Giovanni Azua and @Marko Topolnik for participation and hints.

这篇关于进行JNI呼叫的定量开销是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆