多GPU分析(多个CPU,MPI / CUDA混合) [英] Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)

查看:1673
本文介绍了多GPU分析(多个CPU,MPI / CUDA混合)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在论坛上快速浏览,我不认为这个问题已经被问过了。

I had a quick look on the forums and I don't think this question has been asked already.

我目前使用MPI / CUDA混合代码,由其他人在他的博士期间。
每个CPU都有自己的GPU。
我的任务是通过运行(已经工作的)代码来收集数据,并实现额外的东西。
将此代码转换为单个CPU /多GPU不是一个选项(以后,可能。)

I am currently working with an MPI/CUDA hybrid code, made by somebody else during his PhD. Each CPU has its own GPU. My task is to gather data by running the (already working) code, and implement extra things. Turning this code into a single CPU / Multi-GPU one is not an option at the moment (later, possibly.).

我想要使用性能分析工具分析整个事情。

I would like to make use of performance profiling tools to analyse the whole thing.

现在的想法是让每个CPU启动nvvp自己的GPU和收集数据,而另一个分析工具将照顾一般的CPU / MPI部分(我打算使用TAU,因为我通常这样做)。

For now an idea is to have each CPU launch nvvp for its own GPU and gather data, while another profiling tool will take care of general CPU/MPI part (I plan to use TAU, as I usually do).

启动nvvp的接口8个并发时间(如果运行与8个CPU / GPU)是非常恼人。我想避免通过界面,并得到一个命令行直接写入数据在一个文件,我可以馈送到nvvc的接口后来分析。

Problem is, launching nvvp's interface 8 simultaneous times (if running with 8 CPU/GPUs) is extremely annoying. I would like to avoid going through the interface, and get a command line that directly writes the data in a file, that I can feed to nvvc's interface later and analyse.

我想获得一个将由每个CPU执行的命令行,并将为每个CPU生成一个文件,提供有关自己的GPU的数据。 8(GPU / CPU)= 8个文件。
然后我计划用nvcc一个一个地单独馈送和分析这些文件,手动比较数据。

I'd like to get a command line that will be executed by each CPU and will produce for each of them a file giving data about their own GPU. 8 (GPUs/CPUs) = 8 files. Then I plan to individually feed and analyse these files with nvcc one by one, comparing the data manually.

有什么想法吗?

谢谢!

推荐答案

查看 nvprof CUDA 5.0工具包(目前可作为发布候选人)的一部分。有一些限制 - 它只能收集有限数量的计数器在一个给定的通过,它不能收集指标(所以现在你必须脚本多个启动,如果你想要多个事件)。您可以从nvvp内置帮助中获取更多信息,包括MPI启动脚本示例(如果您有比5.0 RC更新的版本,我建议您查看nvvp帮助以获取最新版本)。

Take a look at nvprof, part of the CUDA 5.0 Toolkit (currently available as a release candidate). There are some limitations - it can only collect a limited number of counters in a given pass and it cannot collect metrics (so for now you'd have to script multiple launches if you want more than a few events). You can get more information from the nvvp built-in help, including an example MPI launch script (copied here but I suggest you check out the nvvp help for an up-to-date version if you have anything newer than the 5.0 RC).

#!/bin/sh
#
# Script to launch nvprof on an MPI process.  This script will
# create unique output file names based on the rank of the 
# process.  Examples:
#   mpirun -np 4 nvprof-script a.out 
#   mpirun -np 4 nvprof-script -o outfile a.out
#   mpirun -np 4 nvprof-script test/a.out -g -j
# In the case you want to pass a -o or -h flag to the a.out, you
# can do this.
#   mpirun -np 4 nvprof-script -c a.out -h -o
# You can also pass in arguments to nvprof
#   mpirun -np 4 nvprof-script --print-api-trace a.out
#

usage () {
 echo "nvprof-script [nvprof options] [-h] [-o outfile] a.out [a.out options]";
 echo "or"
 echo "nvprof-script [nvprof options] [-h] [-o outfile] -c a.out [a.out options]";
}

nvprof_args=""
while [ $# -gt 0 ];
do
    case "$1" in
        (-o) shift; outfile="$1";;
        (-c) shift; break;;
        (-h) usage; exit 1;;
        (*) nvprof_args="$nvprof_args $1";;
    esac
    shift
done

# If user did not provide output filename then create one
if [ -z $outfile ] ; then
    outfile=`basename $1`.nvprof-out
fi

# Find the rank of the process from the MPI rank environment variable
# to ensure unique output filenames.  The script handles Open MPI
# and MVAPICH.  If your implementation is different, you will need to
# make a change here.

# Open MPI
if [ ! -z ${OMPI_COMM_WORLD_RANK} ] ; then
    rank=${OMPI_COMM_WORLD_RANK}
fi
# MVAPICH
if [ ! -z ${MV2_COMM_WORLD_RANK} ] ; then
    rank=${MV2_COMM_WORLD_RANK}
fi

# Set the nvprof command and arguments.
NVPROF="nvprof --output-profile $outfile.$rank $nvprof_args" 
exec $NVPROF $*

# If you want to limit which ranks get profiled, do something like
# this. You have to use the -c switch to get the right behavior.
# mpirun -np 2 nvprof-script --print-api-trace -c a.out -q  
# if [ $rank -le 0 ]; then
#     exec $NVPROF $*
# else
#     exec $*
# fi

这篇关于多GPU分析(多个CPU,MPI / CUDA混合)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆