ASCII数据导入:如何匹配Fortran在C ++中的批量读取性能? [英] ASCII data import: how can I match Fortran's bulk read performance in C++?

查看:94
本文介绍了ASCII数据导入:如何匹配Fortran在C ++中的批量读取性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设置

您好,我有用于读取ASCII双精度数据的Fortran代码(问题底部的数据文件示例):

 程序ReadData整数:: mx,my,mz双精度,可分配,尺寸(:,:,:) ::充电!打开文件"CHGCAR"打开(11,file ='CHGCAR',status ='old')!获取3D系统的范围并分配3D阵列读(11,*)mx,my,mz分配(charge(mx,my,mz))!批量读取系统的整个ASCII数据块读取(11,*)费用结束程序ReadData 

和等效" C ++代码:

  #include< fstream>#include< vector>使用std :: ifstream;使用std :: vector;使用std :: ios;int main(){int mx,my,mz;//打开文件"CHGCAR"ifstream InFile('CHGCAR',ios :: in);//获取3D系统的范围并分配3D数组InFile>>mx>>我的>>mz;vector< vector< double< double>>>电荷(mx,向量< double>(my,向量< double>(mz))));//方法1:将std :: ifstream提取运算符加倍对于(int i = 0; i< mx; ++ i)对于(int j = 0; j< my; ++ j)对于(int k = 0; k< mz; ++ k)InFile>>收费[i] [j] [k];返回0;} 

Fortran踢起@ $$并取名字

请注意该行

 读取(11,*)费用 

执行与C ++代码相同的任务:

  for(int i = 0; i< mx; ++ i)对于(int j = 0; j< my; ++ j)对于(int k = 0; k< mz; ++ k)InFile>>收费[i] [j] [k]; 

其中 InFile 是一个 if流对象(请注意,虽然Fortran代码中的迭代器从1而不是0开始,但是范围是相同的.)

但是,我认为Fortran代码的运行方式比C ++代码要快得多,我认为是因为Fortran可以根据范围和形状( mx my mz ),然后将 charge 指向读取数据的内存.相比之下,C ++代码需要在每次迭代中来回访问 InFile ,然后来回 charge (通常较大),从而导致(我相信)更多的IO和内存操作.

我正在读取数十亿个值(几GB),所以我真的想最大化性能.

我的问题:

如何在C ++中实现Fortran代码的性能?

继续...

这是一个比上述C ++快得多的C ++实现,其中将文件读入一个 char 数组,然后将 charge 填充为 char 数组被解析:

  #include< fstream>#include< vector>#include< cstdlib>使用std :: ifstream;使用std :: vector;使用std :: ios;int main(){int mx,my,mz;//打开文件"CHGCAR"ifstream InFile('CHGCAR',ios :: in);//获取3D系统的范围并分配3D数组InFile>>mx>>我的>>mz;vector< vector< double< double>>>电荷(mx,向量< double>(my,向量< double>(mz))));//方法2:具有strtok()和atof()的big char数组//获取文件大小InFile.seekg(0,InFile.end);int FileSize = InFile.tellg();InFile.seekg(0,InFile.beg);//将整个文件读入FileData向量< char>FileData(FileSize);InFile.read(FileData.data(),FileSize);InFile.close();/**现在只需解析char数组,保存每个*值在电荷密度数组中的位置*/char * TmpCStr = strtok(FileData.data(),"\ n");//获取TmpCStr为第一个数据值for(int i = 0; i< 3&& TmpCStr!= NULL; ++ i)TmpCStr = strtok(NULL,"\ n");对于(int i = 0; i< Mz; ++ i)对于(int j = 0; j< My; ++ j)for(int k = 0; k< Mx& TmpCStr!= NULL; ++ k){电荷[i] [j] [k] = atof(TmpCStr);TmpCStr = strtok(NULL,"\ n");}返回0;} 

同样,这比简单的基于操作符的>> 方法要快得多,但仍然比Fortran版本要慢得多,更不用说更多的代码了.

如何获得更好的性能?

我确定方法2是我自己实现的方法,但是我很好奇如何提高性能以匹配Fortran代码.我正在考虑和当前正在研究的事物类型是:

  • C ++ 11和C ++ 14功能
  • 经过优化的C或C ++库,仅用于执行此类操作
  • 对方法2中使用的各个方法的改进

C ++字符串工具包

特别是,C ++字符串工具包库将使用 FileData 和定界符"\ n" 并给我一个字符串标记对象(将其称为 FileTokens ,则三重 for 循环看起来像

  for(int k = 0; k< Mz; ++ k)对于(int j = 0; j< My; ++ j)对于(int i = 0; i< Mx; ++ i)Charge [k] [j] [i] = FileTokens.nextFloatToken(); 

这会稍微简化代码,但是在本质上将 FileData 的内容复制到 FileTokens 中还需要进行额外的工作,这可能会扼杀使用的任何性能提升 nextFloatToken()方法(大概比 strtok()/ atof()组合更有效).

C ++字符串工具包(StrTk)令牌生成器教程页面(位于问题底部)使用StrTk的 for_each_line()处理器,看起来与所需的应用程序相似.但是,这两种情况之间的区别是,我无法确定在输入文件的每一行上会出现多少数据,而且我对StrTk的了解还不够多,因此无法确定这是否是可行的解决方案.

没有重复

之前曾提到过将ASCII数据快速读取到数组或结构中的话题,但是我查看了以下帖子,而他们的解决方案还不够:

示例数据

这是我要导入的数据文件的示例.ASCII数据由空格和换行符分隔,如下例所示:

  5 3 30.23080516813E + 04 0.22712439791E + 04 0.21616898980E + 04 0.19829996749E + 04 0.17438686650E + 040.14601734127E + 04 0.11551623512E + 04 0.85678544224E + 03 0.59238325489E + 03 0.38232265554E + 030.23514479113E + 03 0.14651943589E + 03 0.10252743482E + 03 0.85927499703E + 02 0.86525872161E + 020.10141182750E + 03 0.13113419142E + 03 0.18057147781E + 03 0.25973252462E + 03 0.38303754418E + 030.57142097675E + 03 0.85963728360E + 03 0.12548019843E + 04 0.17106124085E + 04 0.21415379433E + 040.24687336309E + 04 0.26588012477E + 04 0.27189091499E + 04 0.26588012477E + 04 0.24687336309E + 040.21415379433E + 04 0.17106124085E + 04 0.12548019843E + 04 0.85963728360E + 03 0.57142097675E + 030.38303754418E + 03 0.25973252462E + 03 0.18057147781E + 03 0.13113419142E + 03 0.10141182750E + 030.86525872161E + 02 0.85927499703E + 02 0.10252743482E + 03 0.14651943589E + 03 0.23514479113E + 03 

StrTk示例

以下是上述 StrTk示例.该方案正在解析包含3D网格信息的数据文件:

输入数据:

  5+ 1.0,+ 1.0,+ 1.0-1.0,+ 1.0,-1.0-1.0,-1.0,+ 1.0+ 1.0,-1.0,-1.0+ 0.0,+ 0.0,+ 0.040,1,41,2,42,3,43,1,4 

代码:

 结构点{x,y,z的两倍;};结构三角形{std :: size_t i0,i1,i2;};int main(){std :: string mesh_file ="mesh.txt";std :: ifstream stream(mesh_file.c_str());std :: string s;//流程点部分std :: deque< point>点点p;std :: size_t point_count = 0;strtk :: parse_line(stream,",point_count);strtk :: for_each_line_n(stream,point_count,[&points,& p](const std :: string& line){如果(strtk :: parse(line,,",p.x,p.y,p.z))points.push_back(p);});//处理三角形部分std :: deque< triangle>三角形;三角形tstd :: size_t triangle_count = 0;strtk :: parse_line(stream,",triangle_count);strtk :: for_each_line_n(stream,triangle_count,[& triangles,& t](const std :: string& line){如果(strtk :: parse(line,,",t.i0,t.i1,t.i2))triangles.push_back(t);});返回0;} 

解决方案

此...

  vector< vector< vector< double>>>电荷(mx,向量< double>(my,向量< double>(mz)))); 

...创建一个具有所有0.0值的临时 vector< double>(mz),并将其复制 my 次(或者移动然后复制使用C ++ 11编译器执行my-1 次,但差别不大...)创建一个临时 vector< vector< double>>(my,...)然后被复制 mx 次(如上...)以初始化所有数据.无论如何,您都是通过这些元素读取数据的-无需花费时间在这里初始化数据.相反,创建一个空的 charge 并使用嵌套循环为元素保留足够的内存 reserve()而不填充它们.

接下来,检查您是否在启用优化的情况下进行编译.如果您仍然比FORTRAN慢,请在填充数据的嵌套循环中尝试创建对您要在其中的 .emplace_back 元素的矢量的引用:

  for(int i = 0; i< mx; ++ i)对于(int j = 0; j< my; ++ j){std :: vector< double&v =费用[i] [j];对于(int k = 0; k< mz; ++ k){双d;InFile>>d;v.emplace_pack(d);}} 

如果您的优化器做得不错,那无济于事,但值得一试.

如果您仍然比较慢-或者只是想尝试更快-您可以尝试优化数字解析:您说数据的所有格式为ala 0.23080516813E + 04 -具有固定大小这样,您可以轻松地计算要读入缓冲区的字节数,以便从内存中获得相当数量的值,然后对于每个字节,您可以在之后启动 atol .提取23080516813,然后将其乘以10乘以负的幂(11(您的数字)的负04):为了获得速度,请保留这些10的幂的表,并使用提取的指数(即4)对其进行索引.(请注意,在许多常见硬件上,乘以1E-7比乘以1E7要快.)

如果您想突击此事,请切换到使用内存映射文件访问.值得考虑使用 boost :: mapped_file_source ,因为它甚至比POSIX API(更不用说Windows)更易于使用,而且可移植,但是直接针对OS API进行编程也不是一件容易的事./p>

UPDATE-对第一个&第二条评论

使用增强内存映射的示例:

  #include< boost/iostreams/device/mapped_file.hpp>boost :: mapped_file_params params("dbldat.in");boost :: mapped_file_source文件(参数);file.open();ASSERT(file.is_open());const char * p = file.data();const char * nl = strchr(p,'\ n');std :: istringstream iss(std :: string(p,nl-p));size_t x,y,z;ASSERT(iss> x> y>> z); 

上面的代码将文件映射到地址为 p 的内存中,然后从第一行开始解析尺寸.从 ++ nl 开始,继续解析实际的 double 表示形式.我在上面提到了一种方法,并且您担心数据格式的更改:您可以向文件中添加版本号,因此可以使用优化的解析,直到版本号更改,然后再依赖于"unknown"的通用名称.文件格式.就通用而言,对于使用 int chars_to_skip的内存表示形式;double my_double;ASSERT(sscanf(ptr,%f%n",& my_double,& chars_to_skip)== 1); 是合理的:请参见

#include <fstream>
#include <vector>

using std::ifstream;
using std::vector;
using std::ios;

int main(){
    int mx, my, mz;

    // Open the file 'CHGCAR'
    ifstream InFile('CHGCAR', ios::in);

    // Get the extent of the 3D system and allocate the 3D array
    InFile >> mx >> my >> mz;
    vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));

    // Method 1: std::ifstream extraction operator to double
    for (int i = 0; i < mx; ++i)
        for (int j = 0; j < my; ++j)
            for (int k = 0; k < mz; ++k)
                InFile >> charge[i][j][k];

    return 0;
}

Fortran kicking @$$ and taking names

Note that the line

read(11,*) charge

performs the same task as the C++ code:

for (int i = 0; i < mx; ++i)
    for (int j = 0; j < my; ++j)
        for (int k = 0; k < mz; ++k)
            InFile >> charge[i][j][k];

where InFile is an if stream object (note that while iterators in the Fortran code start at 1 and not 0, the range is the same).

However, the Fortran code runs way, way faster than the C++ code, I think because Fortran does something clever like reading/parsing the file according to the range and shape (values of mx, my, mz) all in one go, and then simply pointing charge to the memory the data was read to. The C++ code, by comparison, needs to access InFile and then charge (which is typically large) back and forth with each iteration, resulting in (I believe) many more IO and memory operations.

I'm reading in potentially billions of of values (several gigabytes), so I really want to maximize performance.

My question:

How can I achieve the performance of the Fortran code in C++?

Moving on...

Here is a much faster (than the above C++) C++ implementation, where the file is read in one go into a char array, and then charge is populated as the char array is parsed:

#include <fstream>
#include <vector>
#include <cstdlib>

using std::ifstream;
using std::vector;
using std::ios;

int main(){
    int mx, my, mz;

    // Open the file 'CHGCAR'
    ifstream InFile('CHGCAR', ios::in);

    // Get the extent of the 3D system and allocate the 3D array
    InFile >> mx >> my >> mz;
    vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));

    // Method 2: big char array with strtok() and atof()

    //  Get file size
    InFile.seekg(0, InFile.end);
    int FileSize = InFile.tellg();
    InFile.seekg(0, InFile.beg);

    //  Read in entire file to FileData
    vector<char> FileData(FileSize);
    InFile.read(FileData.data(), FileSize);
    InFile.close();

    /*
     *  Now simply parse through the char array, saving each
     *  value to its place in the array of charge density
     */
    char* TmpCStr = strtok(FileData.data(), " \n");

    // Gets TmpCStr to the first data value
    for (int i = 0; i < 3 && TmpCStr != NULL; ++i)
        TmpCStr = strtok(NULL, " \n");

    for (int i = 0; i < Mz; ++i)
        for (int j = 0; j < My; ++j)
            for (int k = 0; k < Mx && TmpCStr != NULL; ++k){
                Charge[i][j][k] = atof(TmpCStr);
                TmpCStr = strtok(NULL, " \n");
            }

    return 0;
}

Again, this is much faster than the simple >> operator-based method, but still considerably slower than the Fortran version--not to mention much more code.

How to get better performance?

I'm sure that method 2 is the way to go if I am to implement it myself, but I'm curious how I can increase performance to match the Fortran code. The types of things I'm considering and currently researching are:

C++ String Toolkit

In particular, the C++ String Toolkit Library will take FileData and the delimiters " \n" and give me a string token object (call it FileTokens, then the triple for loop would look like

for (int k = 0; k < Mz; ++k)
    for (int j = 0; j < My; ++j)
        for (int i = 0; i < Mx; ++i)
            Charge[k][j][i] = FileTokens.nextFloatToken();

This would simplify the code slightly, but there is extra work in copying (in essence) the contents of FileData into FileTokens, which might kill any performance gains from using the nextFloatToken() method (presumedly more efficient than the strtok()/atof() combination).

There is an example on the C++ String Toolkit (StrTk) Tokenizer tutorial page (included at the bottom of the question) using StrTk's for_each_line() processor that looks to be similar to my desired application. A difference between the cases, however, is that I cannot assume how many data will appear on each line of the input file, and I do not know enough about StrTk to say if this is a viable solution.

NOT A DUPLICATE

The topic of fast reading of ASCII data to an array or struct has come up before, but I have reviewed the following posts and their solutions were not sufficient:

Example data

Here is an example of the data file I'm importing. The ASCII data is delimited by spaces and line breaks like the below example:

 5 3 3
 0.23080516813E+04 0.22712439791E+04 0.21616898980E+04 0.19829996749E+04 0.17438686650E+04
 0.14601734127E+04 0.11551623512E+04 0.85678544224E+03 0.59238325489E+03 0.38232265554E+03
 0.23514479113E+03 0.14651943589E+03 0.10252743482E+03 0.85927499703E+02 0.86525872161E+02
 0.10141182750E+03 0.13113419142E+03 0.18057147781E+03 0.25973252462E+03 0.38303754418E+03
 0.57142097675E+03 0.85963728360E+03 0.12548019843E+04 0.17106124085E+04 0.21415379433E+04
 0.24687336309E+04 0.26588012477E+04 0.27189091499E+04 0.26588012477E+04 0.24687336309E+04
 0.21415379433E+04 0.17106124085E+04 0.12548019843E+04 0.85963728360E+03 0.57142097675E+03
 0.38303754418E+03 0.25973252462E+03 0.18057147781E+03 0.13113419142E+03 0.10141182750E+03
 0.86525872161E+02 0.85927499703E+02 0.10252743482E+03 0.14651943589E+03 0.23514479113E+03

StrTk example

Here is the StrTk example mentioned above. The scenario is parsing the data file that contains the information for a 3D mesh:

input data:

5
+1.0,+1.0,+1.0
-1.0,+1.0,-1.0
-1.0,-1.0,+1.0
+1.0,-1.0,-1.0
+0.0,+0.0,+0.0
4
0,1,4
1,2,4
2,3,4
3,1,4

code:

struct point
{
   double x,y,z;
};

struct triangle
{
   std::size_t i0,i1,i2;
};

int main()
{
   std::string mesh_file = "mesh.txt";
   std::ifstream stream(mesh_file.c_str());
   std::string s;
   // Process points section
   std::deque<point> points;
   point p;
   std::size_t point_count = 0;
   strtk::parse_line(stream," ",point_count);
   strtk::for_each_line_n(stream,
                          point_count,
                          [&points,&p](const std::string& line)
                          {
                             if (strtk::parse(line,",",p.x,p.y,p.z))
                                points.push_back(p);
                          });

   // Process triangles section
   std::deque<triangle> triangles;
   triangle t;
   std::size_t triangle_count = 0;
   strtk::parse_line(stream," ",triangle_count);
   strtk::for_each_line_n(stream,
                          triangle_count,
                          [&triangles,&t](const std::string& line)
                          {
                             if (strtk::parse(line,",",t.i0,t.i1,t.i2))
                                triangles.push_back(t);
                          });
   return 0;
}

解决方案

This...

vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));

...creates a temporary vector<double>(mz), with all 0.0 values, and copies it my times (or perhaps moves then copies my-1 times with a C++11 compiler, but little difference...) to create a temporary vector<vector<double>>(my, ...) which is then copied mx times (...as above...) to initialise all the data. You're reading data in over these elements anyway - there's no need to spend time initialising it here. Instead, create an empty charge and use nested loops to reserve() enough memory for the elements without populating them yet.

Next, check you're compiling with optimisation on. If you are and you're still slower than FORTRAN, in the data-populating nested loops try creating a reference to the vector you're about .emplace_back elements on to:

for (int i = 0; i < mx; ++i)
    for (int j = 0; j < my; ++j)
    {
        std::vector<double>& v = charge[i][j];
        for (int k = 0; k < mz; ++k)
        {
            double d;
            InFile >> d;
            v.emplace_pack(d);
        }
    }

That shouldn't help if your optimiser's done a good job, but is worth trying as a sanity check.

If you're still slower - or just want to try to be even faster - you could try optimising your number parsing: you say your data's all formatted ala 0.23080516813E+04 - with fixed sizes like that you can easily calculate how many bytes to read into a buffer to give you a decent number of values from memory, then for each you could start an atol after the . to extract 23080516813 then multiply it by 10 to the power of minus (11 (your number of digits) minus 04): for speed, keep a table of those powers of ten and index into it using the extracted exponent (i.e. 4). (Note multiplying by e.g. 1E-7 can be faster than dividing by 1E7 on a lot of common hardware.)

And if you want to blitz this thing, switch to using memory mapped file access. Worth considering boost::mapped_file_source as it's easier to use than even the POSIX API (let alone Windows), and portable, but programming directly against an OS API shouldn't be much of a struggle either.

UPDATE - response to first & second comments

Example of using boost memory mapping:

#include <boost/iostreams/device/mapped_file.hpp>

boost::mapped_file_params params("dbldat.in");
boost::mapped_file_source file(params);
file.open();
ASSERT(file.is_open());
const char* p = file.data();
const char* nl = strchr(p, '\n');
std::istringstream iss(std::string(p, nl - p));
size_t x, y, z;
ASSERT(iss >> x >> y >> z);

The above maps a file into memory at address p, then parses the dimensions from the first line. Continue parsing the actual double representations from ++nl onwards. I mention an approach to that above, and you're concerned about the data format changing: you could add a version number to the file, so you can use optimised parsing until the version number changes then fall back on something generic for "unknown" file formats. As far as something generic goes, for in-memory representations using int chars_to_skip; double my_double; ASSERT(sscanf(ptr, "%f%n", &my_double, &chars_to_skip) == 1); is reasonable: see sscanf docs here - you can then advance the pointer through the data by chars_to_skip.

Next, are you suggesting to combine the reserve() solution with the reference creation solution?

Yes.

And (pardon my ignorance) why would using a reference to charge[i][j] and v.emplace_back() be better than charge[i][j].emplace_back()?

That suggestion was to sanity check that the compiler's not repeatedly evaluating charge[i][j] for each element being emplaced: hopefully it will make no performance difference and you can go back to the charge[i][j].emplace(), but IMHO it's worth a quick check.

Lastly, I'm skeptical about using an empty vector and reserve()ing at the tops of each loop. I have another program that came to a grinding halt using that method, and replacing the reserve()s with a preallocated multidimensional vector sped it up a lot.

That's possible, but not necessarily true in general or applicable here - a lot depends on the compiler / optimiser (particularly loop unrolling) etc.. With unoptimised emplace_back you're having to check vector size() against capacity() repeatedly, but if the optimiser does a good job that should be reduced to insignificance. As with a lot of performance tuning, you often can't reason about things perfectly and conclude what's going to be fastest, and will have to try alternatives and measure them with your actual compiler, program data etc..

这篇关于ASCII数据导入:如何匹配Fortran在C ++中的批量读取性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆