如何加速Java代码? [英] How could this Java code be sped up?

查看:60
本文介绍了如何加速Java代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图确定Java能以多快的速度完成一个简单的任务:将一个大文件读入内存,然后对数据执行一些无意义的计算.所有类型的优化都很重要.无论是用不同的方式重写代码还是使用不同的JVM欺骗JIT ..

I am trying to benchmark how fast can Java do a simple task: read a huge file into memory and then perform some meaningless calculations on the data. All types of optimizations count. Whether it's rewriting the code differently or using a different JVM, tricking JIT ..

输入文件是一个5亿长的32位整数对列表,以逗号分隔.像这样:

Input file is a 500 million long list of 32 bit integer pairs separated by a comma. Like this:

44439,5023
33140,22257
...

44439,5023
33140,22257
...

此文件在我的计算机上占用了 5.5GB .该程序不能使用超过 8GB 的RAM,并且只能使用单线程.

This file takes 5.5GB on my machine. The program can't use more than 8GB of RAM and can use only a single thread.

package speedracer;

import java.io.FileInputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class Main
{
    public static void main(String[] args)
    {
        int[] list = new int[1000000000];

        long start1 = System.nanoTime();
        parse(list);
        long end1 = System.nanoTime();

        System.out.println("Parsing took: " + (end1 - start1) / 1000000000.0);

        int rs = 0;
        long start2 = System.nanoTime();

        for (int k = 0; k < list.length; k++) {
            rs = calc(list[k++], list[k++], list[k++], list[k]);
        }

        long end2 = System.nanoTime();

        System.out.println(rs);
        System.out.println("Calculations took: " + (end2 - start2) / 1000000000.0);
    }

    public static int calc(final int a1, final int a2, final int b1, final int b2)
    {
        int c1 = (a1 + a2) ^ a2;
        int c2 = (b1 - b2) << 4;

        for (int z = 0; z < 100; z++) {
            c1 ^= z + c2;
        }

        return c1;
    }

    public static void parse(int[] list)
    {
        FileChannel fc = null;
        int i = 0;

        MappedByteBuffer byteBuffer;

        try {
            fc = new FileInputStream("in.txt").getChannel();

            long size = fc.size();
            long allocated = 0;
            long allocate = 0;

            while (size > allocated) {

               if ((size - allocated) > Integer.MAX_VALUE) {
                   allocate = Integer.MAX_VALUE;
               } else {
                   allocate = size - allocated;
               }

               byteBuffer = fc.map(FileChannel.MapMode.READ_ONLY, allocated, allocate);
               byteBuffer.clear();

               allocated += allocate;

               int number = 0;

               while (byteBuffer.hasRemaining()) {
                   char val = (char) byteBuffer.get();
                   if (val == '\n' || val == ',') {
                        list[i] = number;

                        number = 0;
                        i++;
                   } else {
                       number = number * 10 + (val - '0');
                   }
                }
            }

            fc.close();

        } catch (Exception e) {
            System.err.println("Parsing error: " + e);
        }
    }
}

我已经尽力了.尝试使用不同的读者,尝试使用openjdk6,sunjdk6,sunjdk7.尝试过不同的读者.由于MappedByteBuffer不能一次映射超过2GB的内存,因此不得不进行一些难看的解析.我正在跑步:

I've tried all I could think of. Trying different readers, tried openjdk6, sunjdk6, sunjdk7. Tried different readers. Had to do some ugly parsing since MappedByteBuffer cannot map more than 2GB of memory at once. I'm running:

   Linux AS292 2.6.38-11-generic #48-Ubuntu SMP 
   Fri Jul 29 19:02:55 UTC 2011 
   x86_64 GNU/Linux. Ubuntu 11.04. 
   CPU: is Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz.

当前,我的分析结果为:26.50s,计算结果为:11.27s.我正在与类似的C ++基准测试竞争,后者几乎在同一时间进行IO,但计算仅需4.5秒.我的主要目标是以任何可能的方式减少计算时间.有什么想法吗?

Currently, my results are for parsing: 26.50s, calculations: 11.27s. I'm competing against a similar C++ benchmark which does the IO in roughly the same time but the calculations take only 4.5s. My main objective is to reduce the calculation time in any means possible. Any ideas?

更新:似乎主要的速度改进可能来自于

Update: It seems the main speed improvement could come from what is called Auto-Vectorization. I was able to find some hints that the current Sun's JIT only does "some vectorization" however I can't really confirm it. It would be great to find some JVM or JIT that would have better auto-vectorization optimization support.

推荐答案

首先,-O3启用:

-finline-functions
-ftree-vectorize

还有其他...

所以看起来它实际上可能是矢量化的.

So it looks like it actually might be vectorizing.

这已经被证实. (请参阅评论).实际上,编译器正在对C ++版本进行矢量化处理.禁用向量化后,C ++版本实际上比Java版本运行慢

EDIT : This has been been confirmed. (see comments) The C++ version is indeed being vectorized by the compiler. With vectorization disabled, the C++ version actually runs a bit slower than the Java version

假设JIT无法向量化循环, Java版本可能很难/不可能与C ++版本的速度相匹配.

Assuming the JIT does not vectorize the loop, it may be difficult/impossible for the Java version to match the speed of the C++ version.

现在,如果我是一个聪明的C/C ++编译器,那么这就是我在(x64上)安排该循环的方式:

Now, if I were a smart C/C++ compiler, here's how I would arrange that loop (on x64):

int c1 = (a1 + a2) ^ a2;
int c2 = (b1 - b2) << 4;

int tmp0 = c1;
int tmp1 = 0;
int tmp2 = 0;
int tmp3 = 0;

int z0 = 0;
int z1 = 1;
int z2 = 2;
int z3 = 3;

do{
    tmp0 ^= z0 + c2;
    tmp1 ^= z1 + c2;
    tmp2 ^= z2 + c2;
    tmp3 ^= z3 + c2;
    z0 += 4;
    z1 += 4;
    z2 += 4;
    z3 += 4;
}while (z0 < 100);

tmp0 ^= tmp1;
tmp2 ^= tmp3;

tmp0 ^= tmp2;

return tmp0;

请注意,此循环是完全可矢量化的.

Note that this loop is completely vectorizable.

更好的是,我将完全展开此循环.这些是C/C ++编译器将要做的事情.但是现在的问题是,准时制会这样做吗?

Even better, I would completely unroll this loop. These are things that a C/C++ compiler will do. But now the question, is will the JIT do it?

这篇关于如何加速Java代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆