C / C ++与Java的/ C#的高性能应用 [英] C/C++ versus Java/C# in high-performance applications

查看:142
本文介绍了C / C ++与Java的/ C#的高性能应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是关于性能与编译code,例如C ++ / FORTRAN /装配在高性能数值应用程序。
我知道这是有争议的话题,但我正在寻找具体的答案/例子。同时社区的wiki。我之前曾问类似的问题,但我想我把它广泛的并没有得到答案,我一直在寻找。

My Question is regarding performance of Java versus compiled code, for example C++/fortran/assembly in high-performance numerical applications. I know this is contentious topic, but I am looking for specific answers/examples. Also community wiki. I have asked similar questions before, but I think I put it broadly and did not get answers I was looking for.

双precision矩阵的矩阵乘法,俗称BLAS库DGEMM,能够实现近100%的峰值CPU性能(每秒浮点运算方面)。结果
有几个因素,允许以实现性能:

double precision matrix matrix multiplication, commonly known as dgemm in blas library, is able to achieve nearly 100 percent peak CPU performance (in terms of floating operations per second).
There are several factors which allow to achieve that performance:


  • 缓存拦截,以达到最大的内存位置

  • cache blocking, to achieve maximum memory locality

循环展开,以减少控制开销

loop unrolling to minimize control overhead

向量指令,如SSE

内存prefetching

memory prefetching

保证没有记忆混淆

我已经看到了很多用汇编,C ++基准,FORTRAN,阿特拉斯,厂商BLAS(典型的案例是尺寸512及以上的矩阵)。
在另一方面,我也听说原则字节编译语言/实现,比如Java可快可几乎一样快,机器编译语言。不过,我还没有看到明确的基准测试显示,它是如此。相反,它似乎(从我自己的研究)字节编译语言要慢很多。

I have have seen lots of benchmarks using assembly, C++, fortran, Atlas, vendor BLAS (typical cases are matrix of dimension 512 and above). On the other hand I have have heard that the principle byte compiled languages/implementations such as Java can be fast or nearly as fast as machine compiled languages. However I have not seen definite benchmarks showing that it is so. On the contrary, it seems (from my own research) byte compiled languages are much slower.

你有没有对Java / C#好矩阵的矩阵乘法基准?
不只是在实时编译器(实际执行中,未假设)能够生产满足我所列举的点指示?

Do you have good matrix matrix multiplication benchmarks for Java/C #? does just-in-time compiler (actual implementation, not hypothetical) able to produce instructions which satisfy points I have listed?

感谢

与性能有关的:
每一个CPU具有最高性能,这取决于指令的处理器每秒可以执行数。例如,现代2 GHz的英特尔CPU可以达到8十亿双precision添加/乘第二,造成8 GFLOPS的峰值性能。矩阵矩阵乘法是算法之一,它能够与问候每秒,主要原因是计算过的内存操作比例较高的操作(N ^ 3 / N ^ 2)。数字我感兴趣的是一个什么东西的顺序 N'GT上; 500

with regards to performance: every CPU has peak performance, depending on number of instructions processor can execute per second. For example, modern 2 ghz Intel CPU can achieve 8 billion double precision add/multiply a second, resulting in 8 gflops peak performance. Matrix matrix multiply is one of algorithms which is able to achieve nearly full performance with regards number of operations per second, main reason being higher ratio of compute over memory operations (N^3/N^2). Numbers I am interested in a something on the order N > 500 .

与问候实现:更高级别的细节,如阻断在源头code级别进行。较低级别的优化编译器通过处理,或许与问候对准/别名编译器提示。字节编译的实现可以采用模块方式以及书写,因此原则源体面实施code的细节将是非常相似的。

with regards to implementation: higher-level details such as blocking is done at source code level. Lower-level optimization is handled by compiler, perhaps with compiler hints with regards to alignment/alias. Byte compiled implementation can be written using block approach as well, so in principle source code details for decent implementation will be very similar.

推荐答案

VC的比较++ / .NET 3.5 / 2.2单在纯矩阵乘法方案:

A comparison of VC++/.NET 3.5/Mono 2.2 in a pure matrix multiplication scenario:


来源

单声道与Mono.Simd走一段很长的路要走朝着缩小与手工优化的C ++这里的性能差距,但C ++版本仍清晰最快的。不过,单是2.6现在可能更接近,我会想到,如果.NET曾经得到类似Mono.Simd,它可能是非常有竞争力,因为不是.NET和顺序C ++这里太大的区别。

Mono with Mono.Simd goes a long way towards closing the performance gap with the hand-optimized C++ here, but the C++ version is still clearly the fastest. But Mono is at 2.6 now and might be closer and I would expect that if .NET ever gets something like Mono.Simd, it could be very competitive as there's not much difference between .NET and the sequential C++ here.

这篇关于C / C ++与Java的/ C#的高性能应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆