Matlab的MEX文件很慢相比,其直接等价的C [英] Matlab mex file is slow compared to its straight C equivalent

查看:1039
本文介绍了Matlab的MEX文件很慢相比,其直接等价的C的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在无法解释(避免)一个Matlab MEX程序并没有Matlab的接口对应的C程序之间的速度差异。我一直在分析数值分析程序:

I'm at a loss to explain (and avoid) the differences in speed between a Matlab mex program and the corresponding C program with no Matlab interface. I've been profiling a numerical analysis program:

int main(){

Well_optimized_code();

}

用gcc编译4.4对Matlab的墨西哥当量(直接使用gcc44,这不是目前在Matlab支持的版本,但它需要其他原因):

compiled with gcc 4.4 against the Matlab-Mex equivalent (directed to use gcc44, which is not the version currently supported by Matlab, but it's required for other reasons):

void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[]){

Well_optimized_code(); //literally the exact same code

}

我执行的时序为:

I performed the timings as:

$ time ./C_version

VS

>> tic; mex_version(); toc

在时间的差异是惊人的。在命令行中运行的版本呈现平均为5.8秒。在Matlab版本在21秒内运行。对于背景下,MEX文件取代了SimBiology工具箱大约需要26秒运行的算法。

The difference in timing is staggering. The version run from the command line takes 5.8 seconds on average. The version in Matlab runs in 21 seconds. For context, the mex file replaces an algorithm in the SimBiology toolbox that takes about 26 seconds to run.

因为相比Matlab的算法,无论是C和MEX版本使用规模为OpenMP电话线多达27个线程,但分析这些电话的目的,已被禁用,并注释掉。

As compared to Matlab's algorithm, both the C and mex versions scale linearly up to 27 threads using calls to openMP, but for the purposes of profiling these calls have been disabled and commented out.

的两个版本都被编译以同样的方式与所需的标志的例外编译为MEX文件:-fPIC --shared -lmex -DMATLAB_MEX_FILE在MEX汇编/链接被应用。我已经删除的文件MEX的左边和右边的参数的所有引用。也就是说它没有输入,没有给出输出,它是专为分析。

The two versions have been compiled in the same way with the exception of the necessary flags to compile as a mex file: -fPIC --shared -lmex -DMATLAB_MEX_FILE being applied in the mex compilation/linking. I've removed all references to the left and right arguments of the mex file. That is to say it takes no inputs and gives no outputs, it is solely for profiling.

的伟大而光荣的谷歌已经通知我,位置无关code不应该是放缓的来源,除此之外,我不知所措。

The Great and Glorious Google has informed me that the position independent code should not be the source of the slowdown and beyond that I'm at a loss.

任何帮助将AP preciated,

Any help will be appreciated,

安德鲁

推荐答案

一个月我在Mathworks公司的联系人发送电子邮件,用我自己的code玩耍,和剖析我的code各种方法后,我有一个答案;但是,它可能是最不满意的答案,我曾经有过一个技术问题:

After a month of emailing with my contacts at Mathworks, playing around with my own code, and profiling my code every which way, I have an answer; however, it may be the most dissatisfying answer I have ever had to a technical question:

短版升级到matlab版2011A(上周正式发布),这个问题现在已经解决了。

The short version is "upgrade to Matlab version 2011a (officially released last week), this issue has now been resolved".

更长的版本认为与墨西哥网关的版本,2010年b相关的开销的问题和更早版本。我已经能够提取最好的解释是,这个开销不评估一次,而我们每次都付一点的函数调用另一个函数,在链接库。

The longer version regards an issue of the overhead associated with the mex gateway in versions 2010b and earlier. The best explanation that I've been able to extract is that this overhead is not assessed once, rather we pay a little bit every time a function calls another function that is in a linked library.

虽然会出现这种情况令我感到困惑,这是我做的SHARK分析至少是一致的。当我的个人资料和比较原生应用程序和应用程序MEX之间的差异是有循环模式。的时间是在源$ C ​​$ C我写的应用程序不会改变功能度过的。本地和MEX实现之间进行比较时的库函数所花费的时间增加了一点点。在另一个库函数用于构建这个库增加的差别很多。时间差继续增加,因为我们进行更深,直到我们达到通过实施BLAS

While why this occurs baffles me, it is at least consistent with the SHARK profiling that I did. When I profile and compare the differences between the native app and the mex app there is a recurring pattern. The time spent in functions that are in the source code I wrote for the app does not change. The time spent in library functions increases a little when comparing between the native and mex implementations. Functions in another library used to build this library increase the difference a lot. The time difference continues to increase as we proceed ever deeper until we reach by BLAS implementation.

一对夫妇的频繁使用的BLAS功能是主要的罪魁祸首。该注意到在本机应用我计算时间〜1%的功能是在30%的时钟在MEX功能。

A couple of heavily used BLAS functions were the main culprits. A function that took ~1% of my computation time in the native app was clocking in at 30% in the mex function.

在MEX网关的实施,2010年b出现和2011A之间发生了变化。我的MacBook上的本机应用程序约需6秒,MEX版本大约需要6.5秒。这是开销,我可以应付。

The implementation of the mex gateway appears to have changed between 2010b and 2011a. On my macbook the native app takes about 6 seconds and the mex version takes 6.5 seconds. This is overhead that I can deal with.

作为的根本原因,我只能猜测。 Matlab的有它在跨pretive编码的根源。由于MEX功能是动态库,我猜每个MEX库是不知道什么是对,直到运行时的联系。由于Matlab的建议用户很少使用MEX,然后只对小型计算密集型块,我认为大的程序(如ODE求解器)都很少实现。这些程序,像我一样,是受害最重的人。

As for the underlying cause, I can only speculate. Matlab has it's roots in interpretive coding. Since mex functions are dynamic libraries, I'm guessing that each mex library was unaware of what it was linked against until runtime. Since Matlab suggests the user rarely use mex and then only for small computationally intensive chunks, I assume that large programs (such as an ODE solver) are rarely implemented. These programs, like mine, are the ones that suffer the most.

我已经成型了几个,我知道在C中实现然后使用MEX编译MATLAB函数(调用动力学模型sbioaccelerate的SimBiology工具箱的一部分,尤其是后sbiosimulate),并似乎有一些显著速度提升。因此,2011A更新似乎是更广泛地比通常的半年度升级有益的。

I've profiled a couple of Matlab functions that I know to be implemented in C then compiled using mex (especially sbiosimulate after calling sbioaccelerate on kinetic models, part of the SimBiology toolbox) and there appears to be some significant speed ups. So the 2011a update appears to be more broadly beneficial than the usual semi-yearly upgrade.

祝您好运与类似问题的其他codeRS的。感谢所有让我在正确的方向开始了有益的建议。

Best of luck to other coders with the similar issues. Thanks for all of the helpful advice that got me started in the right direction.

- 安德鲁

这篇关于Matlab的MEX文件很慢相比,其直接等价的C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆