并行计算-随机播放 [英] Parallel Computing - Shuffle

查看:83
本文介绍了并行计算-随机播放的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻求并行处理数组.我发现,执行类似于双音排序的算法,但随机(50/50)重排序会导致均等分布,但前提是该数组的幂为2.我已经考虑过Yates Fisher Shuffle,但是为了避免O(N)计算,我看不出如何对其并行化.

I am looking to shuffle an array in parallel. I have found that doing an algorithm similar to bitonic sort but with a random (50/50) re-order results in an equal distribution but only if the array is a power of 2. I've considered the Yates Fisher Shuffle but I can't see how I could parallel-ize it in order to avoid O(N) computations.

有什么建议吗?

谢谢!

推荐答案

这里,有一篇很好的清晰的最新论文. 和参考文献,特别是Shun等人2015年值得一读.

There's a good clear recent paper on this here and the references, especially Shun et al 2015 are worth a read.

但是,基本上,您可以使用与sort -R中使用的方法相同的方法来做到这一点:通过为每一行分配一个随机键值并对该键进行排序来随机播放.而且有很多方法可以进行良好的并行分布式排序.

But basically you can do this using the same sort of approach that's used in sort -R: shuffle by giving each row a random key value and sorting on that key. And there are lots of ways to do good parallel distributed sort.

这是python + MPI中使用奇偶排序的基本版本;如果P是处理器数,它将经历P个通信步骤.您可以做得更好,但这很容易理解.在此问题中进行了讨论.

Here's a basic version in python + MPI using an odd-even sort; it goes through P communication steps if P is the number of processors. You can do better than that, but this is pretty simple to understand; it's discussed in this question.

from __future__ import print_function
import sys
import random
from mpi4py import MPI

comm = MPI.COMM_WORLD

def exchange(localdata, sendrank, recvrank):
    """
    Perform a merge-exchange with a neighbour;
    sendrank sends local data to recvrank,
    which merge-sorts it, and then sends lower
    data back to the lower-ranked process and
    keeps upper data
    """
    rank = comm.Get_rank()
    assert rank == sendrank or rank == recvrank
    assert sendrank < recvrank

    if rank == sendrank:
        comm.send(localdata, dest=recvrank)
        newdata = comm.recv(source=recvrank)
    else:
        bothdata = list(localdata)
        otherdata = comm.recv(source=sendrank)
        bothdata = bothdata + otherdata
        bothdata.sort()
        comm.send(bothdata[:len(otherdata)], dest=sendrank)
        newdata = bothdata[len(otherdata):]
    return newdata

def print_by_rank(data, rank, nprocs):
    """ crudely attempt to print data coherently """
    for proc in range(nprocs):
        if proc == rank:
            print(str(rank)+": "+str(data))
            comm.barrier()
    return

def odd_even_sort(data):
    rank = comm.Get_rank()
    nprocs = comm.Get_size()
    data.sort()
    for step in range(1, nprocs+1):
        if ((rank + step) % 2) == 0:
            if rank < nprocs - 1:
                data = exchange(data, rank, rank+1)
        elif rank > 0:
            data = exchange(data, rank-1, rank)
    return data

def main():
    # everyone get their data
    rank = comm.Get_rank()
    nprocs = comm.Get_size()
    n_per_proc = 5
    data = list(range(n_per_proc*rank, n_per_proc*(rank+1)))

    if rank == 0:
        print("Original:")
    print_by_rank(data, rank, nprocs)

    # tag your data with random values
    data = [(random.random(), item) for item in data]

    # now sort it by these random tags
    data = odd_even_sort(data)

    if rank == 0:
        print("Shuffled:")
    print_by_rank([x for _, x in data], rank, nprocs)

    return 0


if __name__ == "__main__":
    sys.exit(main())

跑步给:

$ mpirun -np 5 python mergesort_shuffle.py
Original:
0: [0, 1, 2, 3, 4]
1: [5, 6, 7, 8, 9]
2: [10, 11, 12, 13, 14]
3: [15, 16, 17, 18, 19]
4: [20, 21, 22, 23, 24]

Shuffled:
0: [19, 17, 4, 20, 9]
1: [23, 12, 3, 2, 8]
2: [14, 6, 13, 15, 1]
3: [11, 0, 22, 16, 18]
4: [5, 10, 21, 7, 24]

这篇关于并行计算-随机播放的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆