Java任务分配和网格上的集合 [英] Java task distribution and collection on a grid

查看:137
本文介绍了Java任务分配和网格上的集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序在群集/网格上运行,我需要运行N个不需要通信的任务。我只需要收集每项任务的结果。因此,我有一位主人将任务分配给一些从属人员(可能运行在不同的主机上),并将所有结果合并在一起。



由于群集由一批系统中,我的节点配置会随着每次运行而改变,并且我会得到一份已分配给我工作的节点列表。



我在找一个库(纯Java)来帮助我。我看了下面的内容:

MPJ - 对我来说不起作用,因为当有多个处理器可用时MPJ运行的方式同一台机器。它使用自定义类加载器,这给我带来了我加载的本地库的问题(因为自定义类加载器多次加载类,所以加载了多次)。



Hazelcast - 原则上不起作用,但它并不是真正为此做的(我可以用一个队列分配作业,并将结果放回另一个队列中,但看起来有点过分)。我喜欢的是建立一组节点很简单(原则上只需要指定一个节点,其他节点就可以连接到它)。



< Simon / RMI - 我想我可以让每个从机向主机注册,然后让主机为每个从机分配作业。或者让每个奴隶请求一个排队作业的队列和一个队列,以便从主服务器存储结果。



Cajo - 会在原则上工作,但我不希望在网格网络上有组播,并且似乎没有办法解决这个问题。


$ b RabbitMQ - 我不喜欢运行额外的服务器,它不是纯Java。 ZeroMQ 也一样。



Akka - 似乎也是过度杀伤力。和很多配置一样,设置节点组。

Hadoop - 像Akka似乎是一个矫枉过正的问题,尤其是配置设置组合节点。



JPPF - 似乎更适合设置长时间运行的服务器和节点群集。在我的应用程序完成后,我需要停止所有服务器和节点。此外,它似乎依赖于任务序列化,这对我来说不是一种选择(见下文)。

因此,我会坚持使用Hazelcast或Simon。哪一种更适合这种应用?有谁知道另一个库(不要太重,不要太多配置)。其他建议吗?



Hazelcasts ExecutorService不是btw的选项。因为我使用的是一些JNI,因此序列化会很痛苦。

解决方案

请让我知道这个解决方案是不是工作。
Hazelcast使用Executor Service提供多节点任务执行。

所以你会得到你想要执行任务的节点列表。



然后

  HazelcastInstance h = Hazelcast.newHazelcastInstance(); 
Set< Member> members = h.getCluster()。getMembers(); //或给出您的需求的任何子集
MultiTask< Long> multitask = new MultiTask< Long>(new MyCallableTask(default),members);
ExecutorService es = h1.getExecutorService();
es.execute(multitask);
收藏

您需要做的唯一事情是让 MyCallableTask 在所有节点的类路径中。

I have an application running on a cluster/grid where I need to run N tasks that do not have to communicate. I only need to collect the result of each task. So I have a Master distributing the tasks to some Slaves (possibly running on different hosts) and combining all the results at the end.

As the cluster is controlled by a batch system the configuration of my nodes changes for each run and I get a list of nodes that have been assigned to me for my job.

I'm looking for a library (pure Java) to help me with this. I looked at the following:

MPJ - doesn't work for me because of the way that MPJ runs when there are multiple processors available on the same machine. It uses custom class loaders and this gives me problems with a native library that I'm loading (it's loaded multiple times because the custom class loaders load the class multiple times).

Hazelcast - works in principle but it's not really made for this (I can distribute jobs with a queue and put the results back in another queue but it seems like a bit of an overkill). What I like is that it's easy to set up the group of nodes (in principle just one needs to be specified and the other nodes can just connect to it).

Simon/RMI - I guess I could let each slave register with the master and then let the master distribute jobs to each slave. Or let each slave request a queue where the jobs are queued and a queue where the results should be stored from the master.

Cajo - would in principle work but I don't want to have multicast on the grid network and there seems to be no way around this for Cajo.

RabbitMQ - I don't like to have an extra server running and it's not pure Java. Same for ZeroMQ.

Akka - Seems to be overkill as well. And a lot of configuration to set up the group of nodes.

Hadoop - Like Akka seems to be an overkill, especially the configuration to set up the group of nodes.

JPPF - Seems to be more suited for setting up a long running cluster of servers and nodes. After my application finishes I need to stop all servers and nodes. Also it seems to rely on Serialization of the Tasks which is not an option for me (see further below)

So I would stick with either Hazelcast or Simon. Which one is better suited for this kind of application? Does anyone know another library (not too heavy, not too much configuration). Any other suggestions?

Hazelcasts ExecutorService is not an option btw. because I'm using some JNI and so the serialization would be a pain.

解决方案

Let me know if this solution doesn't work. Hazelcast provides a multi node task execution with Executor Service.

So you'll get the list of nodes that you want a task to be executed.

And then

HazelcastInstance h = Hazelcast.newHazelcastInstance();
Set<Member> members = h.getCluster().getMembers();//or any subset given your requirement 
MultiTask<Long> multitask = new MultiTask<Long>(new MyCallableTask("default"), members);
ExecutorService es = h1.getExecutorService();
es.execute(multitask);
Collection<Long> results = multitask.get();

The only thing you need to do is to have the class of MyCallableTask in the classpath of all nodes.

这篇关于Java任务分配和网格上的集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆