创建一个跨MPI进程保持同步的计数器 [英] Creating a counter that stays synchronized across MPI processes

查看:139
本文介绍了创建一个跨MPI进程保持同步的计数器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有相当多的经验使用基本的comm和组MPI2方法,并做了相当多的尴尬并行模拟工作使用MPI。到目前为止,我已经将我的代码结构化为具有一个分派节点和一堆工作节点。调度节点具有将与模拟器一起运行的参数文件的列表。它使用参数文件对每个工作节点进行种子。工作节点运行他们的模拟,然后请求另一个参数文件,调度节点提供。一旦所有参数文件都运行,分派节点关闭每个工作节点,然后自己关闭。

I have quite a bit of experience using the basic comm and group MPI2 methods, and do quite a bit of embarrassingly parallel simulation work using MPI. Up until now, I have structured my code to have a dispatch node, and a bunch of worker nodes. The dispatch node has a list of parameter files that will be run with the simulator. It seeds each worker node with a parameter file. The worker nodes run their simulation, then request another parameter file, which the dispatch node provides. Once all parameter files have been run, the dispatch node shuts down each worker node, before shutting itself down.

参数文件通常命名为Par_N.txt,其中N是识别整数(例如,N = 1-1000)。所以我在想,如果我可以创建一个计数器,并且可以使所有的节点,这个计数器同步,我可以消除需要一个调度节点,使系统更简单。简单到这听起来在理论上,在实践中我怀疑它有点更困难,因为我需要确保计数器被锁定,而改变,等..以及可能有一个内置的方式为MPI的处理这个。有什么想法吗?

The parameter files are typically named "Par_N.txt" where N is the identifying integer (e.g.- N = 1-1000). So I was thinking, if I could create a counter, and could have this counter synchronized across all of my nodes, I could eliminate the need to have a dispatch node, and make the system a bit more simple. As simple as this sounds in theory, in practice I suspect it is a bit more difficult, as I'd need to ensure the counter is locked while being changed, etc.. And thought there might be a built-in way for MPI to handle this. Any thoughts? Am I over thinking this?

推荐答案

实现共享计数器并不简单,但一旦你这样做,

Implementing a shared counter isn't trivial, but once you do it and have it in a library somewhere you can do a lot with it.

很多 / research / projects / mpi / usingmpi2 />使用MPI-2 书,如果你要实现这个东西,你应该得手一个例子(代码是在线)是一个共享计数器。 不可扩展的应该很好地工作到几十个进程 - 计数器是一个0..size-1的整数,每个rank一个数组,然后`get next work item#'操作包括锁定窗口,读取每个人对计数器的贡献(在这种情况下,他们已经采取了多少项目),更新您自己的(++),关闭窗口,并计算总计。这一切都是通过被动单侧操作完成的。 (更好的缩放只是使用树而不是1-d数组)。

In the Using MPI-2 book, which you should have to hand if you're going to implement this stuff, one of the examples (the code is available online) is a shared counter. The "non-scalable" one should work well out to several dozens of processes -- the counter is an array of 0..size-1 of integers, one per rank, and then the `get next work item #' operation consists of locking the window, reading everyone elses' contribution to the counter (in this case, how many items they've taken), updating your own (++), closing the window, and calculating the total. This is all done with passive one-sided operations. (The better-scaling one just uses a tree rather than a 1-d array).

所以使用的是你说排名0是主机的计数器,每个人都在做工作单位和更新计数器,以获得下一个,直到没有更多的工作;

So the use would be you have say rank 0 host the counter, and everyone keeps doing work units and updating the counter to get the next one until there's no more work; then you wait at a barrier or something and finalize.

一旦你有这样的东西 - 使用共享的值来获得下一个工作单元 - 工作,那么你可以推广到更复杂的方法。所以suzterpatt建议,每个人在开始工作单位的他们的份额工作伟大,但如果一些完成比其他更快,该怎么办?通常的答案是工作窃取;每个人都将他们的工作单元列表出列,然后当一个工作完成时,它会从另一端的工作单元中删除一些工作单元,直到没有剩余的工作。这是真正的master-worker的完全分布式版本,没有更多的单个主分区工作。一旦你有一个共享计数器工作,你可以使互斥体,从那里,你可以实现出队。

Once you have something like this - using a shared value to get the next work unit available - working, then you can generalize to more sophisticated approach. So as suzterpatt suggested, everyone taking "their share" of work units at the start works great, but what to do if some finish faster than others? The usual answer now is work-stealing; everyone keeps their list of work units in a dequeue, and then when one runs out of work, it steals work units from the other end of someone elses dequeue, until there's no more work left. This is really the completely-distributed version of master-worker, where there's no more single master partitioning work. Once you have a single shared counter working, you can make mutexes from those, and from that you can implement the dequeue. But if the simple shared-counter works well enough, you may not need to go there.

更新:确定,这里是一个hacky-尝试做共享计数器 - 我的版本的简单的一个在MPI-2书:似乎工作,但我不会说什么比那更强大(没有玩这个东西很长一段时间)。有一个简单的计数器实现(对应于MPI-2书中的非缩放版本),有两个简单的测试,大致对应你的工作情况;每个项目更新计数器以获得工作项,然后执行工作(睡眠随机的时间量)。在每个测试结束时,计数器数据结构被打印出来,这是每个等级已经完成的增量的数量。

Update: Ok, so here's a hacky-attempt at doing the shared counter - my version of the simple one in the MPI-2 book: seems to work, but I wouldn't say anything much stronger than that (haven't played with this stuff for a long time). There's a simple counter implementation (corresponding to the non-scaling version in the MPI-2 book) with two simple tests, one corresponding roughly to your work case; each item updates the counter to get a work item, then does the "work" (sleeps for random amount of time). At the end of each test, the counter data structure is printed out, which is the # of increments each rank has done.

#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

struct mpi_counter_t {
    MPI_Win win;
    int  hostrank ;
    int  myval;
    int *data;
    int rank, size;
};

struct mpi_counter_t *create_counter(int hostrank) {
    struct mpi_counter_t *count;

    count = (struct mpi_counter_t *)malloc(sizeof(struct mpi_counter_t));
    count->hostrank = hostrank;
    MPI_Comm_rank(MPI_COMM_WORLD, &(count->rank));
    MPI_Comm_size(MPI_COMM_WORLD, &(count->size));

    if (count->rank == hostrank) {
        MPI_Alloc_mem(count->size * sizeof(int), MPI_INFO_NULL, &(count->data));
        for (int i=0; i<count->size; i++) count->data[i] = 0;
        MPI_Win_create(count->data, count->size * sizeof(int), sizeof(int),
                       MPI_INFO_NULL, MPI_COMM_WORLD, &(count->win));
    } else {
        count->data = NULL;
        MPI_Win_create(count->data, 0, 1,
                       MPI_INFO_NULL, MPI_COMM_WORLD, &(count->win));
    }
    count -> myval = 0;

    return count;
}

int increment_counter(struct mpi_counter_t *count, int increment) {
    int *vals = (int *)malloc( count->size * sizeof(int) );
    int val;

    MPI_Win_lock(MPI_LOCK_EXCLUSIVE, count->hostrank, 0, count->win);

    for (int i=0; i<count->size; i++) {

        if (i == count->rank) {
            MPI_Accumulate(&increment, 1, MPI_INT, 0, i, 1, MPI_INT, MPI_SUM,
                           count->win);
        } else {
            MPI_Get(&vals[i], 1, MPI_INT, 0, i, 1, MPI_INT, count->win);
        }
    }

    MPI_Win_unlock(0, count->win);
    count->myval += increment;

    vals[count->rank] = count->myval;
    val = 0;
    for (int i=0; i<count->size; i++)
        val += vals[i];

    free(vals);
    return val;
}

void delete_counter(struct mpi_counter_t **count) {
    if ((*count)->rank == (*count)->hostrank) {
        MPI_Free_mem((*count)->data);
    }
    MPI_Win_free(&((*count)->win));
    free((*count));
    *count = NULL;

    return;
}

void print_counter(struct mpi_counter_t *count) {
    if (count->rank == count->hostrank) {
        for (int i=0; i<count->size; i++) {
            printf("%2d ", count->data[i]);
        }
        puts("");
    }
}

int test1() {
    struct mpi_counter_t *c;
    int rank;
    int result;

    c = create_counter(0);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    result = increment_counter(c, 1);
    printf("%d got counter %d\n", rank, result);

    MPI_Barrier(MPI_COMM_WORLD);
    print_counter(c);
    delete_counter(&c);
}


int test2() {
    const int WORKITEMS=50;

    struct mpi_counter_t *c;
    int rank;
    int result = 0;

    c = create_counter(0);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    srandom(rank);

    while (result < WORKITEMS) {
        result = increment_counter(c, 1);
        if (result <= WORKITEMS) {
             printf("%d working on item %d...\n", rank, result);
             sleep(random() % 10);
         } else {
             printf("%d done\n", rank);
         }
    }

    MPI_Barrier(MPI_COMM_WORLD);
    print_counter(c);
    delete_counter(&c);
}

int main(int argc, char **argv) {

    MPI_Init(&argc, &argv);

    test1();
    test2();

    MPI_Finalize();
}

这篇关于创建一个跨MPI进程保持同步的计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆