MPI Alltoallv或更好的个人Send and Recv? (表现) [英] MPI Alltoallv or better individual Send and Recv? (Performance)

查看:229
本文介绍了MPI Alltoallv或更好的个人Send and Recv? (表现)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有许多进程(数量在100到1000之间),并且每个进程都必须向其他进程中的某些进程(例如大约10个)发送一些数据. (通常,但不一定总是这样,如果A发送给B,B也发送给A.)每个进程都知道必须从哪个进程接收多少数据.

所以我可以只使用MPI_Alltoallv,将许多或大多数消息长度设为零. 但是,我听说出于性能的原因,使用几种MPI_sendMPI_recv 通信方式而不是更好 /strong>全局 MPI_Alltoallv . 我不明白的是:如果一系列发送和接收呼叫的效率比一个Alltoallv呼叫的效率高,那么为什么 Alltoallv不是,只是实现为 一系列发送和接收?

使用一个全局呼叫对我(和其他人?)会方便得多.另外,我可能还需要担心不会因使用几个Send和Recv(通过某种奇偶策略或更复杂?或者通过使用缓冲的send/recv?而修复)而陷入僵局的情况.

您是否同意MPI_Alltoallv比10 MPI_SendMPI_Recv是必要的;如果是,为什么和多少?

解决方案

通常,带有集合的默认建议是相反的:在可能的情况下使用集合操作,而不是自己编写代码. MPI库掌握的有关通信模式的信息越多,内部进行优化的机会就越多.

除非有特殊的硬件支持,否则集体调用实际上是在发送和接收方面在内部实现的.但是实际的通信模式可能不只是一系列发送和接收.例如,使用树来广播一条数据可能比具有相同等级将其发送到一堆接收器要快.优化集体沟通的工作很多,而且很难做得更好.

话虽如此,MPI_Alltoallv有所不同.在MPI级别上可能难以针对所有不规则通信场景进行优化,因此可以想象某些自定义通信代码可以做得更好.例如,MPI_Alltoallv的实现可能正在同步:它可能要求所有进程都签入",即使它们必须发送长度为0的消息也是如此.我虽然不太可能实现这种实现,但是这里是一个疯狂的. /p>

因此,真正的答案是取决于情况".如果MPI_Alltoallv的库实现与该任务不匹配,则自定义通信代码将获胜.但是在走那条路之前,请检查MPI-3邻居集合是否适合您的问题.

I have a number of processes (of the order of 100 to 1000) and each of them has to send some data to some (say about 10) of the other processes. (Typically, but not necessary always, if A sends to B, B also sends to A.) Every process knows how much data it has to receive from which process.

So I could just use MPI_Alltoallv, with many or most of the message lengths zero. However, I heard that for performance reasons it would be better to use several MPI_send and MPI_recv communications rather than the global MPI_Alltoallv. What I do not understand: if a series of send and receive calls are more efficient than one Alltoallv call, why is Alltoallv not just implemented as a series of sends and receives?

It would be much more convenient for me (and others?) to use just one global call. Also I might have to be concerned about not running into a deadlock situation with several Send and Recv (fixable by some odd-even strategy or more complex? or by using buffered send/recv?).

Would you agree that MPI_Alltoallv is necessary slower than the, say, 10 MPI_Send and MPI_Recv; and if yes, why and how much?

解决方案

Usually the default advice with collectives is the opposite: use a collective operation when possible instead of coding your own. The more information the MPI library has about the communication pattern, the more opportunities it has to optimize internally.

Unless special hardware support is available, collective calls are in fact implemented internally in terms of sends and receives. But the actual communication pattern will probably not be just a series of sends and receives. For example, using a tree to broadcast a piece of data can be faster than having the same rank send it to a bunch of receivers. A lot of work goes into optimizing collective communications, and it is difficult to do better.

Having said that, MPI_Alltoallv is somewhat different. It can be difficult to optimize for all irregular communication scenarios at the MPI level, so it is conceivable that some custom communication code can do better. For example, an implementation of MPI_Alltoallv might be synchronizing: it could require that all processes "check in", even if they have to send a 0-length message. I though that such an implementation is unlikely, but here is one in the wild.

So the real answer is "it depends". If the library implementation of MPI_Alltoallv is a bad match for the task, custom communication code will win. But before going down that path, check if the MPI-3 neighbor collectives are a good fit for your problem.

这篇关于MPI Alltoallv或更好的个人Send and Recv? (表现)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆