如何在mapper(Hadoop)中使用MATLAB代码? [英] How to use MATLAB code in mapper (Hadoop)?

查看:226
本文介绍了如何在mapper(Hadoop)中使用MATLAB代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个处理图像的matlab代码。我想创建一个使用该代码的Hadoop映射器。我遇到了以下解决方案,但不确定哪一个最好(因为在我的hadoop的每个slave节点上安装matlab编译器运行时非常困难):


  1. 手动将该matlab代码转换为C ++中的OpenCV,并从映射器调用其exe / dll(并提供适当的参数)。不确定,因为集群在每个节点上安装了Linux,而不是Windows。

  2. 使用Hadoop Streaming。但Hadoop流需要一个可执行文件作为映射器,并且matlab的可执行文件也需要Matlab编译器运行时,这在每个从节点上都很难安装。 转换它自动进入C / C ++代码并自动创建其exe文件(不确定这是否正确,因为exe文件需要matlab运行时才能运行,或者转换中可能存在编译器问题,这些问题很难修复)


  3. 使用Matlab Java Builder。但是这样创建的jar文件也需要运行时间。


有什么建议?



在此先感谢。

解决方案

正如您可能已经在怀疑,这将会变得固有困难因为MATLAB的运行时需求。当试图在Condor上运行MATLAB代码时,我有类似的经历(必须分发运行时库)。



就您列出的选项而言,选项# 1将效果最好。另外,您可能无法避免使用Linux。



但是,如果您不想失去更高级别软件(如MATLAB ,Octave,Scilab等),您可以将Hadoop流与Octave可执行脚本结合使用。



Hadoop流不关心可执行文件的性质(无论是可执行脚本或可执行文件,根据此(http://hadoop.apache.org/common/docs/r0.15.2/streaming.html))。



所有它需要的是,它被赋予一个可执行文件,此外还可以a)从标准输入读取,b)将输出发送到标准输出。 b
$ b

GNU Octave程序可以转换为可执行脚本(在Linux中),能够从stdin读取并将输出发送到stdout(http://www.gnu.org/software/octave / b> / b>

作为一个简单的例子,考虑一下:

使用以下内容创建文件(例如al.oct):

 #!/ bin / octave -qf (请注意,在我的安装中,我不得不使用#!/ etc / alternatives / octave -qf)
Q = fread(stdin); #标准八度/ MATLAB代码从这里
disp(Q);

现在从命令提示符发出以下命令:



chmod + x al.oct


al.oct现在是一个可执行文件...您可以使用./al.oct执行它。要查看stdin,stdout适合哪里(以便您可以在Hadoop中使用它),您可以试试这个:

 > > cat al.oct | ./al.oct | sort 

换句话说...... cat文件al.oct,将其输出管道输出到可执行脚本al.oct,然后将al.oct的输出传送到排序实用程序(这只是一个例子,我们可以有猫任何文件,但是我们知道al.oct是一个简单的文本文件,我们只是使用它)。



当然,Octave不支持您的MATLAB代码试图调用的所有东西,但这可以成为使用Hadoop Streaming而不会失去更高级别代码的便利/强大功能的另一种方式。


I have a matlab code that processes images. I want to create a Hadoop mapper that uses that code. I came across the following solutions but not sure which one is best (as it is very difficult to install matlab compiler runtime on each slave node in hadoop for me):

  1. Manually convert that matlab code into OpenCV in C++ and call its exe/dll (and supply it appropriate parameters) from the mapper. Not sure, since the cluster has Linux installed on every node instead of Windows.

  2. Use Hadoop Streaming. But Hadoop streaming requires an executable as the mapper and the executable of matlab also requires Matlab Compiler Runtime which is very difficult to install on every slave node.

  3. Convert it automatically into C/C++ code and create its exe automatically (not sure whether this is right because either the exe will require the matlab runtime to run or there can be compiler issues in the conversion which are very difficult to fix )

  4. Use Matlab Java Builder. But the jar file thus created will need the runtime too.

Any suggestions?

Thanks in advance.

解决方案

As you are probably already suspecting, this is going to be inherently difficult to do because of the runtime requirement for MATLAB. I had a similar experience (having to distribute the runtime libraries) when attempting to run MATLAB code over Condor.

As far as the options you are listing are concerned, option #1 will work best. Also, you will probably not be available to avoid working with Linux.

However, if you don't want to lose the convenience provided by higher level software (such as MATLAB, Octave, Scilab and others) you could try Hadoop streaming in combination with Octave executable scripts.

Hadoop streaming does not care about the nature of the executable (whether it is an executable script or an executable file, according to this (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html)).

All it requires, is that it is given an "executable" that in addition can a) read from stdin, b) send output to stdout.

GNU Octave programs can be turned into executable scripts (in Linux) with the ability to read from stdin and send the output to stdout (http://www.gnu.org/software/octave/doc/interpreter/Executable-Octave-Programs.html).

As a simple example consider this:

Create a file (for example "al.oct") with the following contents:

#!/bin/octave -qf  (Please note, in my installation i had to use "#!/etc/alternatives/octave -qf")
Q = fread(stdin); #Standard Octave / MATLAB code from here on
disp(Q);

Now from the command prompt issue the following command:

chmod +x al.oct

al.oct is now an executable...You can execute it with "./al.oct". To see where the stdin,stdout fits in (so that you can use it with Hadoop) you can try this:

>>cat al.oct|./al.oct|sort

Or in other words..."cat" the file al.oct, pipe its output to the executable script al.oct and then pipe the output of al.oct to the sort utility (this is just an example,we could have "cat" any file, but since we know that al.oct is a simple text file we just use this).

It could be of course that Octave does not support everything your MATLAB code is trying to call, but this could be an alternative way to using Hadoop Streaming without losing the convenience / power of higher level code.

这篇关于如何在mapper(Hadoop)中使用MATLAB代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆