如何在多线程HPC集群中运行二进制可执行文件? [英] How to run binary executables in multi-thread HPC cluster?

查看:198
本文介绍了如何在多线程HPC集群中运行二进制可执行文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个来自complete genomics的名为cgatools的工具( http://cgatools. sourceforge.net/docs/1.8.0/).我需要在高性能计算集群中运行一些基因组分析.我尝试运行分配超过50个内核和250gb内存的作业,但是它仅使用一个内核并将内存限制为小于2GB.在这种情况下,我最好的选择是什么?有没有一种方法可以在HPC群集中运行二进制可执行文件,使其使用所有分配的内存?

I have this tool called cgatools from complete genomics (http://cgatools.sourceforge.net/docs/1.8.0/). I need to run some genome analyses in High-Performance Computing Cluster. I tried to run the job allocating more than 50 cores and 250gb memory, but it only uses one core and limits the memory to less than 2GB. What would be my best option in this case? Is there a way to run binary executables in HPC cluster making it use all the allocated memory?

推荐答案

调度程序仅在分配的第一个节点上运行您提供的二进制文件.拆分作业并并行运行它的责任在二进制文件上.因此,您看到使用的是50个分配核心中的1个.

The scheduler just runs the binary provided by you on the first node allocated. The onus of splitting the job and running it in parallel is on the binary. Hence, you see that you are using one core out of the fifty allocated.

在代码级别并行化

您需要确保要作为作业提交给集群的二进制文件具有某种机制来了解分配的节点(与Job Scheduler交互)以及一种利用分配的资源的机制(MPI, PGAS等).

You will need to make sure that the binary that you are submitting as a job to the cluster has some mechanism to understand the nodes that are allocated (interaction with the Job Scheduler) and a mechanism to utilize the allocated resources (MPI, PGAS etc.).

如果已并行化,则通过作业提交脚本(通过诸如mpirun/mpiexec之类的包装器)提交二进制文件应利用所有分配的资源.

If it is parallelized, submitting the binary through a job submission script (through a wrapper like mpirun/mpiexec) should utilize all the allocated resources.

并行运行黑盒串行二进制文件

如果不是,则跨资源的唯一可能的工作负载分配机制是数据并行模式,其中,您可以使用群集为同一二进制文件提供多个输入并并行运行流程,以有效地减少解决问题所需的时间.问题.

If not, the only other possible workload distribution mechanism across the resources is the data parallel mode, wherein, you use the cluster to supply multiple inputs to the same binary and run the processes in parallel to effectively reduce the time taken to solve the problem.

您可以根据每次运行所需的内存来设置粒度.例如,如果每个进程需要1GB内存,则每个节点可以运行16个进程(假定16个内核和16GB内存等)

You can set the granularity based on the memory required for each run. For example, if each process needs 1GB of memory, you can run 16 processes per node (with assumed 16 cores and 16GB memory etc.)

可以通过工具并行来在单个节点上并行提交多个输入. .然后,您可以将多个作业提交到集群,每个作业请求1个节点(独占访问和并行工具)并分别处理不同的输入元素.

The parallel submission of multiple inputs on a single node can be done through the tool Parallel. You can then submit multiple jobs to the cluster, with each job requesting 1 node (exclusive access and the parallel tool) and working on different input elements respectively.

如果您不想启动"n"个单独的作业,则可以使用调度程序提供的机制,例如

If you do not want to launch 'n' separate jobs, you can use the mechanisms provided by the scheduler like blaunch to specify the machine on which the job is supposed to be run dynamically. You can parse the names of the machines allocated by the scheduler and further use blaunch like script to emulate the submission of n jobs from the first node.

注意:这类应用程序最好在像安装程序这样的云上运行,而不是在典型的HPC系统上运行[在所有可用并行级别(集群,线程和SIMD)上有效利用群集是HPC的关键部分.]

Note: These class of applications are better off being run on a cloud like setup instead of typical HPC systems [effective utilization of the cluster at all the levels of available parallelism (cluster, thread and SIMD) is a key part of HPC.]

这篇关于如何在多线程HPC集群中运行二进制可执行文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆