使用MPI IO并行输出到单个文件 [英] Parallel output using MPI IO to a single file

查看:581
本文介绍了使用MPI IO并行输出到单个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的任务要做,但不知怎的,我仍然被卡住了。



我有一个BIG数据文件(File_initial.dat),它应该被集群上的所有节点读取(使用MPI),每个节点都会执行一些操作在这个BIG文件(File_size / number_of_nodes)的一部分上,最后每个节点将其结果写入一个共享的BIG文件(File_final.dat)。文件元素的数量保持不变。


  1. 通过谷歌搜索我明白,将数据文件写为一个二进制文件(我在这个文件中只有十进制数字)而不是* .txt文件。因为没有人会读取这个文件,只读取计算机。


  2. <我试图实现自己(但使用格式化的输出和非二进制文件),但我的行为不正确。


我的代码到目前为止:

  #include< fstream> 
#define NNN 30

int main(int argc,char ** argv)
{
ifstream fin;

//设置MPI环境

int rank,nprocs;
MPI_File文件;
MPI_Init(& argc,& argv);
MPI_Comm_size(MPI_COMM_WORLD,& nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&排名);

//读取初始文件

fin.open(initial.txt);
for(int i = 0; i< NNN ; i ++)
{
fin>> RES [I];
cout<< res [i]<< ENDL; //看看,我在文件中有什么
}
fin.close();

//作为流程等级函数的res数组中的起始位置
int Pstart =(NNN / nprocs)* rank;
//指定写入文件的偏移量
MPI_Offset offset = sizeof(double)* rank;
MPI_File文件;
MPI_Status状态;

//打开一个共享文件
MPI_File_open(MPI_COMM_WORLD,final.txt,MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL,& file);

//为每个节点数组设置本地

double * localArray;
localArray = new double [NNN / nprocs];

//执行一些基本操作(平方数组的每个元素)
for(int i = 0; i<(NNN / nprocs); i ++)
{
localArray [i] = res [Pstart + i] * res [Pstart + i];
}

//将每个本地数组的结果写入共享的最终文件:

MPI_File_seek(file,offset,MPI_SEEK_SET);
MPI_File_write(file,localArray,sizeof(double),MPI_DOUBLE,& status);
MPI_File_close(& file);

MPI_Finalize();

返回0;
}

我明白,我做错了什么,同时尝试写双重作为文本文件。



如何更改代码以便能够保存


  1. as .txt文件(格式输出)

  2. as .dat文件(二进制文件)


解决方案

您的二进制文件输出几乎是正确的;但是您对文件中的偏移量和要写入的数据量的计算是不正确的。您希望您的偏移量为

  MPI_Offset offset = sizeof(double)* Pstart; 

  MPI_Offset offset = sizeof(double)* rank; 

否则你将每个等级覆盖彼此的数据为(比如)<3的排名 nprocs = 5 开始在文件中以双号3开头,而不是(30/5)* 3 = 18。



此外,你希望每个等级写 NNN / nprocs 双打,而不是 sizeof(双)双打,这意味着你想要

  MPI_File_write(file,localArray,NNN / nprocs,MPI_DOUBLE,& status); 

如何写为文本文件是一个更大的问题;你必须在内部将数据转换为字符串,然后输出这些字符串,确保通过仔细格式化知道每行需要多少个字符。 此答案对此进行了描述网站。


I have a very simple task to do, but somehow I am still stuck.

I have one BIG data file ("File_initial.dat"), which should be read by all nodes on the cluster (using MPI), each node will perform some manipulation on part of this BIG file (File_size / number_of_nodes) and finally each node will write its result to one shared BIG file ("File_final.dat"). The number of elements of files remain the same.

  1. By googling I understood, that it is much better to write data file as a binary file (I have only decimal numbers in this file) and not as *.txt" file. Since no human will read this file, but only computers.

  2. I tried to implement myself (but using formatted in/output and NOT binary file) this, but I get incorrect behavior.

My code so far follows:

#include <fstream>
#define NNN 30

int main(int argc, char **argv)
{   
    ifstream fin;

    // setting MPI environment

    int rank, nprocs;
    MPI_File file;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    // reading the initial file

    fin.open("initial.txt");
    for (int i=0;i<NNN;i++)
    {  
        fin  >> res[i];
        cout << res[i] << endl; // to see, what I have in the file
    }  
    fin.close();

    // starting position in the "res" array as a function of "rank" of process
    int Pstart = (NNN / nprocs) * rank ;
    // specifying Offset for writing to file
    MPI_Offset offset = sizeof(double)*rank;
    MPI_File file;
    MPI_Status status;

    // opening one shared file
    MPI_File_open(MPI_COMM_WORLD, "final.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY,
                          MPI_INFO_NULL, &file);

    // setting local for each node array

    double * localArray;
    localArray = new double [NNN/nprocs];

    // Performing some basic manipulation (squaring each element of array)
    for (int i=0;i<(NNN / nprocs);i++)
    {
        localArray[i] = res[Pstart+i]*res[Pstart+i];
    }

    // Writing the result of each local array to the shared final file:

    MPI_File_seek(file, offset, MPI_SEEK_SET);
    MPI_File_write(file, localArray, sizeof(double), MPI_DOUBLE, &status);
    MPI_File_close(&file);

    MPI_Finalize();

    return 0;
}

I understand, that I do something wrong, while trying to write double as a text file.

How one should change the code in order to be able to save

  1. as .txt file (format output)
  2. as .dat file (binary file)

解决方案

Your binary file output is almost right; but your calculations for your offset within the file and the amount of data to write is incorrect. You want your offset to be

MPI_Offset offset = sizeof(double)*Pstart;

not

MPI_Offset offset = sizeof(double)*rank;

otherwise you'll have each rank overwriting each others data as (say) rank 3 out of nprocs=5 starts writing at double number 3 in the file, not (30/5)*3 = 18.

Also, you want each rank to write NNN/nprocs doubles, not sizeof(double) doubles, meaning you want

MPI_File_write(file, localArray, NNN/nprocs, MPI_DOUBLE, &status);

How to write as a text file is a much bigger issue; you have to convert the data into string internally and then output those strings, making sure you know how many characters each line requires by careful formatting. That is described in this answer on this site.

这篇关于使用MPI IO并行输出到单个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆