在Java中运行可运行的CUDA代码的最简单方法是什么? [英] What is the easiest way to run working CUDA code in Java?

查看:336
本文介绍了在Java中运行可运行的CUDA代码的最简单方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些用C语言编写的CUDA代码,它似乎运行良好(它是普通的旧C语言,而不是C ++).我正在运行Hadoop集群,并希望整合我的代码,因此理想情况下,我希望在Java中运行它(长话短说:系统太复杂了.)

I have some CUDA code I made in C and it seems to be working fine (it's plain old C and not C++). I’m running a Hadoop cluster and wanted to consolidate my code so ideally I’m looking to run it within Java (long story short: system is too complex).

当前,C程序解析一个日志文件,占用几千行,在GPU上并行处理每一行,将特定的错误/事务保存到链接列表中,并将其写入驱动器.

Currently the C program parses a log file, takes a few thousand lines, processes each line in parallel on the GPU, saves specific errors/transactions into a linked list, and writes them to the drive.

执行此操作的最佳方法是什么? JCUDA是到C CUDA的完美映射还是完全不同?还是从Java中调用 C代码并共享结果是否有意义(链接列表是否可访问)?

What is the best approach to do this? Is JCUDA a perfect mapping to C CUDA or is it totally different? Or does it make sense to call C code from Java and share results (would the linked list be accessible)?

推荐答案

IMO? JavaCPP .例如,这是在 Thrust网站主页上显示的示例的Java端口:

IMO? JavaCPP. For example, here is a port to Java of the example displayed on the main page of Thrust's Web site:

import com.googlecode.javacpp.*;
import com.googlecode.javacpp.annotation.*;

@Platform(include={"<thrust/host_vector.h>", "<thrust/device_vector.h>", "<thrust/generate.h>", "<thrust/sort.h>",
                   "<thrust/copy.h>", "<thrust/reduce.h>", "<thrust/functional.h>", "<algorithm>", "<cstdlib>"})
@Namespace("thrust")
public class ThrustTest {
    static { Loader.load(); }

    public static class IntGenerator extends FunctionPointer {
        static { Loader.load(); }
        protected IntGenerator() { allocate(); }
        private native void allocate();
        public native int call();
    }

    @Name("plus<int>")
    public static class IntPlus extends Pointer {
        static { Loader.load(); }
        public IntPlus() { allocate(); }
        private native void allocate();
        public native @Name("operator()") int call(int x, int y);
    }

    @Name("host_vector<int>")
    public static class IntHostVector extends Pointer {
        static { Loader.load(); }
        public IntHostVector() { allocate(0); }
        public IntHostVector(long n) { allocate(n); }
        public IntHostVector(IntDeviceVector v) { allocate(v); }
        private native void allocate(long n);
        private native void allocate(@ByRef IntDeviceVector v);

        public IntPointer begin() { return data(); }
        public IntPointer end() { return data().position((int)size()); }

        public native IntPointer data();
        public native long size();
        public native void resize(long n);
    }

    @Name("device_ptr<int>")
    public static class IntDevicePointer extends Pointer {
        static { Loader.load(); }
        public IntDevicePointer() { allocate(null); }
        public IntDevicePointer(IntPointer ptr) { allocate(ptr); }
        private native void allocate(IntPointer ptr);

        public native IntPointer get();
    }

    @Name("device_vector<int>")
    public static class IntDeviceVector extends Pointer {
        static { Loader.load(); }
        public IntDeviceVector() { allocate(0); }
        public IntDeviceVector(long n) { allocate(n); }
        public IntDeviceVector(IntHostVector v) { allocate(v); }
        private native void allocate(long n);
        private native void allocate(@ByRef IntHostVector v);

        public IntDevicePointer begin() { return data(); }
        public IntDevicePointer end() { return new IntDevicePointer(data().get().position((int)size())); }

        public native @ByVal IntDevicePointer data();
        public native long size();
        public native void resize(long n);
    }

    public static native @MemberGetter @Namespace IntGenerator rand();
    public static native void copy(@ByVal IntDevicePointer first, @ByVal IntDevicePointer last, IntPointer result);
    public static native void generate(IntPointer first, IntPointer last, IntGenerator gen);
    public static native void sort(@ByVal IntDevicePointer first, @ByVal IntDevicePointer last);
    public static native int reduce(@ByVal IntDevicePointer first, @ByVal IntDevicePointer last, int init, @ByVal IntPlus binary_op);

    public static void main(String[] args) {
        // generate 32M random numbers serially
        IntHostVector h_vec = new IntHostVector(32 << 20);
        generate(h_vec.begin(), h_vec.end(), rand());

        // transfer data to the device
        IntDeviceVector d_vec = new IntDeviceVector(h_vec);

        // sort data on the device (846M keys per second on GeForce GTX 480)
        sort(d_vec.begin(), d_vec.end());

        // transfer data back to host
        copy(d_vec.begin(), d_vec.end(), h_vec.begin());

        // compute sum on device
        int x = reduce(d_vec.begin(), d_vec.end(), 0, new IntPlus());
    }
}

尽管如此,您在C语言中的代码应该更容易映射.

Your code in C should be easier to map though.

我们可以使用以下命令来编译此文件并在Linux x86_64上运行,或者通过适当地修改-properties选项在其他受支持的平台上运行:

We can get this compiled and running on Linux x86_64 with these commands, or on other supported platforms by modifying the -properties option appropriately:

$ javac -cp javacpp.jar ThrustTest.java
$ java -jar javacpp.jar ThrustTest -properties linux-x86_64-cuda
$ java  -cp javacpp.jar ThrustTest

这篇关于在Java中运行可运行的CUDA代码的最简单方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆