记忆/速度问题的一般策略 [英] General strategies for memory/speed problems

查看:162
本文介绍了记忆/速度问题的一般策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个c ++代码,它运行约200个ASCII文件,做一些基本的数据处理,并输出一个单一的ASCII文件(基本上)所有的数据。



程序首先运行得非常快,然后逐渐减速,可能会逐渐减慢一点,然后在相当缓慢的速度通过休息。也就是说它在大约5秒钟内通过第一〜80个文件,在大约50秒内〜200个文件。每个文件基本相同。



我正在寻找关于如何跟踪问题或内存泄漏的建议。






一些更多的细节:
首先我会在我的程序的开头fopen(FILE * outputFile,w)和fclose结束。第一〜40个文件需要〜4秒;



我认为输出文件可能会阻塞内存,所以我将代码更改为fopen(outputFile,a)迭代(即每次我打开一个新文件)和fclose()每次我关闭输入文件...这将性能提高到〜50秒,如上所述。



似乎很奇怪,这个修复会非常明显,但不完全。



此外,我不是动态分配任何内存'new'or'delete'or'free'or whatever)....所以我甚至不知道如何可以内存泄漏。



任何帮助将不胜感激!
谢谢!






代码:

  vector< string> dirCon; 
//使用boost :: filesystem存储目录中的每个文件
bool retVal = FileSystem :: getDirectoryContents(HOME_DIR + HISTORY_DIR,& dirCon,2);

int counter = 0;
for(int i = 0; i //创建输出文件
FILE * outFile;
string outputFileName = HOME_DIR ...;
//打开文件作为追加a
bool ifRet = initFile(outFile,outputFileName.c_str(),a);
if(!ifRet){
fprintf(stderr,ERROR ...);
return false;
}

//获取最上面的目录名称
size_t loc = dirCon.at(i).find_last_of(/);
string dirName = dirCon.at(i).substr(loc + 1,(dirCon.at(i).size() - (loc + 1)))

//获取顶级目录内容
vector< string> subDirCon;
bool subRetVal = FileSystem :: getDirectoryContents(dirCon.at(i),& subDirCon);
if(!subRetVal){fprintf(stderr,ERROR\\ \\ n); return false; }

//遍历目录中的每个文件,查找与
匹配的文件(int j = 0; j< subDirCon.size(); j ++){

//获取文件名
loc = subDirCon.at(j).find_last_of(/);
string fileName = subDirCon.at(j).substr(loc + 1,(subDirCon.at(j).size() - (loc + 1)))

//如果filename匹配所需的工作站,处理和存储
if(fileName == string(dirName ...)){
//打开文件
FILE * inFile;
if(!initFile(inFile,subDirCon.at(j).c_str(),r)){
fprintf(stderr,ERROR:...!\\\
);
break;
}

//逐行解析文件
char str [TB_CHARLIMIT_LARGE];
const char * delim =,;
while(true){
vector< string> splitString;
fgets(str,TB_CHARLIMIT_LARGE,inFile);

if(feof(inFile)){break; } // break在文件末尾
removeEndLine(str);

//如果没有注释行,则解析
if(str [0]!= COMCHAR){
string strString(str);
//删除结束行char
strString.erase(std :: remove(strString.begin(),strString.end(),'\\\
'),strString.end());
strcpy(str,strString.c_str());

char * temp = strtok(str,delim);
char * lastTemp;
while(temp!= NULL){
splitString.push_back(string(temp));
temp = strtok(NULL,delim);
}
if(splitString.size()> 0){
DateTime dtTemp(splitString.at(0));
goodLines ++;

/ * ...进程splitString,使用dtTemp ... * /

//输出到文件
fprintf(outFile,%s\\\
,strFromStrVec(splitString).c_str());
}
}
} // while
fclose(inFile);
}
} // j
cout<< GoodLines =<< goodLines< endl;

fclose(outFile);
} // i

bool getDirectoryContents(const string dirName,vector< string> * conts){
path p(dirName);
try {
//确认存在
if(!exists(p)){
fprintf(stderr,ERROR:'%s'不存在!\\\
,dirName.c_str());
return false;
}

//确认目录
if(!is_directory(p)){
return false;
}

conts-> clear();

//存储稍后排序的路径
typedef vector< path> vec;
vec v;

copy(directory_iterator(p),directory_iterator(),back_inserter(v));

sort(v.begin(),v.end());

for(vec :: const_iterator it(v.begin()),it_end(v.end()); it!= it_end; ++ it){
conts-> push_back(it-> string());
}


} catch(const filesystem_error& ex){
fprintf(stderr,ERROR:'%s'!\\\
,ex.what ());
return false;
}

return true;
}


解决方案

没有更多信息,我猜你正在处理的是Schlemiel的Painter算法:(Original) (维基百科)。他们非常容易落入字符串处理。让我给你一个例子。



我想读取文件中的每一行,以某种方式处理每一行,通过一些中间处理运行它。然后我想收集结果,也许写回到磁盘。这里有一种方法。我犯了一个很容易错过的巨大错误:

  // proc.cpp 
class Foo
{
public:
std :: string chew_on(std :: string const& line_to_chew_on){...}
...
}

Foo处理器;
std :: string buffer;

//读取/处理
FILE * input = fopen(...,r);
char linebuffer [1000 + 1];
for(char * line = fgets(linebuffer,1000,input); line;
line = fgets(linebuffer,1000,input))
{
buffer = buffer + processor .chew_on(line); //(1)
}
fclose(input);

//写
FILE * output = fopen(...,w);
fwrite(buffer.data(),1,buffer.size(),output);
fclose(output);

这里可能容易错过的问题是每个时间线(1),则复制 buffer 的全部内容。如果有1000行,每行100个字符,你最终花费时间复制100 + 200 + 300 + 400 + .... + 100,000 = 5,050,000字节副本来运行这个。增加到10,000行? 500,500,000。这个油漆可以越来越远。



在这个特定的例子中,修复很容易。行(1)应为:

  chew_on(line)); //(2)

或等效地:(感谢Matthieu M):

  buffer + = processor.chew_on(line); 

这有助于因为(通常) std :: string 不需要完整复制 buffer 来执行 append 函数,而在(1),我们坚持要复制一份。



更一般地说,假设你在做保持状态,(b)你经常引用它的全部或大部分,和(c)该状态随时间增长。然后有一个公平,很好的机会,你写了一个Θ(n 2 )时间算法,将完全显示你正在谈论的行为类型。



编辑



当然,股票的答案是为什么是我的代码慢?是运行配置文件。有一些工具和技术来做到这一点。一些选项包括:



  • callgrind / kcachegrind (如 David Schwartz
  • 随机暂停(由 Mike Dunlavey 提供)
  • GNU分析器, gprof
  • GNU测试覆盖率分析器, gcov
  • oprofile



    他们都有自己的优势。 随机暂停可能是最简单的实现,虽然可能很难解释结果。 'gprof'和'gcov'基本上是无用的多线程程序。 Callgrind是彻底但慢,有时可以在多线程程序上玩奇怪的技巧。 oprofile是快速的,可以很好地与多线程程序一起使用,但可能很难使用,并且可能会错过。



    但是,如果你试图剖析单线程程序,并且正在用GNU工具链开发,gprof可以是一个很好的选择。拿我的proc.cpp,上面。为了演示的目的,我将描述未优化的运行。首先,我重新构建我的程序进行性能分析(添加 -pg 到编译和链接步骤):

      $ g ++ -O0 -g -pg -o proc.o -c proc.cpp 
    $ g ++ -pg -o proc proc.o



    我运行程序一次以创建分析信息:

      ./ proc 

    除了执行通常做的操作外,此运行将创建一个文件在当前目录中调用gmon.out。现在,我运行gprof解释结果:

     
    $ gprof ./proc
    平面配置文件:

    每个样本计数为0.01秒。
    %累积自我总计
    时间秒秒调用ms /调用ms /调用名
    100.50 0.01 0.01 234937 0.00 0.00 std :: basic_string< ...> std :: operator +< ...>(...)
    0.00 0.01 0.00 234937 0.00 0.00 Foo :: chew_on(std :: string const&)
    0.00 0.01 0.00 1 0.00 10.05 do_processing std :: string const&,std :: string const&)
    ...

    是的,100.5%的程序时间用在 std :: string运算符+ 中。好了,好了,有些抽样错误。 (我在VM中运行这个...似乎gprof捕获的时间是关闭的我的程序花了比0.01累积秒运行...)



    对于我很简单的例子,gcov有点不太有指导意义。但这里是它发生什么显示。首先,编译并运行gcov:

      $ g ++ -O0 -fprofile-arcs -ftest-coverage -o proc proc.cpp 
    $ ./proc
    $ gcov ./proc
    ...

    这会创建一组以 .gcno .gcda 结尾的文件。 gcov .gcov 中的文件告诉我们在运行期间每行代码执行的次数。因此,在我的示例中,我的 proc.cpp.gcov 最终如下所示:

     
    - :0:源:proc.cpp
    - :0:图:proc.gcno
    - :0:数据:proc.gcda
    - :0:运行:1
    - :0:程序:1
    - :1:#include
    - :2:#include
    - :4:class Foo
    - :5:{
    - :6:public:
    234937:7:std :: string chew_on(std :: string const&line_to_chew_on){return line_to_chew_on;}
    - :8:
    - :9:
    - :10:
    - :11:
    1:12:int do_processing(std :: string const&infile,std :: string const&outfile)
    - :13:{
    - :14:Foo processor;
    2:15:std :: string buffer;
    - :16:
    - :17://读取/处理
    1:18:FILE * input = fopen(infile.c_str(),r);
    - :19:char linebuffer [1000 + 1];
    234938:20:for(char * line = fgets(linebuffer,1000,input); line;
    - :21:line = fgets(linebuffer,1000,input))
    - 22:{
    234937:23:buffer = buffer + processor.chew_on(line); //(1)
    - :24:}
    1:25:fclose(input);
    - :26:
    - :27:// Write
    1:28:FILE * output = fopen(outfile.c_str(),w);
    1:29:fwrite(buffer.data(),1,buffer.size(),output);
    1:30:fclose(output);
    1:31:}
    - :32:
    1:33:int main()
    - :34:{
    1:35:do_processing usr / share / dict / words,outfile);
    - :36:}

    因此,我必须得出结论std :: string: :操作符+在第23行(执行234,937次)是我程序缓慢的潜在原因。



    另外,callgrind / kcachegrind使用多线程程序,提供更多,更多的信息。对于这个程序,我运行:

      g ++ -O0 -o proc proc.cpp 
    valgrind --tool = callgrind。 / proc#这永远运行
    kcachegrind callgrind.out。*

    下面的输出,显示了真正吃掉我的周期是很多很多内存副本(在 __ memcpy_ssse3_back 中花费的执行时间的99.4%),我可以看到所有发生在某处在我的来源中的第23行下:


    I have a c++ code which runs through about 200 ASCII files, does some basic data processing, and outputs a single ASCII file with (basically) all of the data.

    The program runs very quickly at first, then slows down drastically part way through, perhaps slows gradually a little more, then procedes at a fairly slow pace through the rest. I.e. it gets through the first ~80 files in about 5 seconds, ~200 total files in about 50 seconds. Each file is basically the same.

    I'm looking for suggestions on how to track down the problem or memory leak.


    Some more details: At first I would fopen(FILE *outputFile, "w") at the beginning of my program, and fclose() at the end. the first ~40 files would take ~ 4 seconds; then about 1.5 minutes for ~200 files.

    I thought maybe the output file was clogging the memory, so I changed the code to fopen(outputFile, "a") each iteration (i.e. each time I opened a new file), and fclose() eachtime I closed the input file... this increased the performance to ~50 seconds total, as mentioned above.

    It seems strange that this 'fix' would help so noticeably, but not entirely.

    Also, I'm not dynamically allocating any memory (no calls to 'new' or 'delete' or 'free' or whatever).... so I'm not even sure how I could have a memory leak.

    Any help would be appreciated! Thanks!


    Code:

    vector<string> dirCon;
    // Uses boost::filesystem to store every file in directory
    bool retVal = FileSystem::getDirectoryContents(HOME_DIR+HISTORY_DIR, &dirCon, 2);
    
    int counter = 0;
    for(int i = 0; i < dirCon.size(); i++) { 
        // Create output file
        FILE *outFile;
        string outputFileName = HOME_DIR ... ;
        // open file as append "a"
        bool ifRet = initFile(outFile, outputFileName.c_str(), "a");
        if(!ifRet) {
            fprintf(stderr, "ERROR ... ");
            return false;
        }       
    
        // Get the topmost directory name
        size_t loc = dirCon.at(i).find_last_of("/");
        string dirName = dirCon.at(i).substr(loc+1, (dirCon.at(i).size()-(loc+1)));
    
        // Get the top directory content
        vector<string> subDirCon;
        bool subRetVal = FileSystem::getDirectoryContents(dirCon.at(i), &subDirCon);
        if(!subRetVal) { fprintf(stderr, "ERROR\n"); return false; }
    
        // Go through each file in directory, look for the one that matches
        for(int j = 0; j < subDirCon.size(); j++) {
    
            // Get filename
            loc = subDirCon.at(j).find_last_of("/");
            string fileName = subDirCon.at(j).substr(loc+1, (subDirCon.at(j).size()-(loc+1)));
    
            // If filename matches desired station, process and store
            if( fileName == string(dirName ...) ) {
                // Open File
                FILE *inFile;
                if(!initFile(inFile, subDirCon.at(j).c_str(), "r")) { 
                    fprintf(stderr, "ERROR: ... !\n");
                    break;
                }
    
                // Parse file line-by-line
                char str[TB_CHARLIMIT_LARGE];
                const char *delim = ",";
                while(true) {
                    vector<string> splitString;
                    fgets(str, TB_CHARLIMIT_LARGE, inFile);
    
                    if(feof(inFile)) { break; }     // break at end of file
                    removeEndLine(str);
    
                    // If non-comment line, parse
                    if(str[0] != COMCHAR){
                        string strString(str);
                        // remove end line char
                        strString.erase(std::remove(strString.begin(), strString.end(), '\n'), strString.end());
                        strcpy(str, strString.c_str());
    
                        char *temp = strtok(str,delim);
                        char *lastTemp;
                        while(temp != NULL) {
                            splitString.push_back(string(temp));
                            temp = strtok(NULL,delim);
                        }
                        if(splitString.size() > 0) { 
                            DateTime dtTemp(splitString.at(0));  
                            goodLines++;
    
                            /*  ... process splitString, use dtTemp ... */
    
                            // Output to file
                            fprintf(outFile, "%s\n", strFromStrVec(splitString).c_str());
                        }
                    }
                } //while
                fclose(inFile); 
            }
        } //j
        cout << "GoodLines = " << goodLines << endl;
    
        fclose(outFile);
    } // i
    
    bool getDirectoryContents(const string dirName, vector<string> *conts) {
        path p(dirName);
        try {
            // Confirm Exists
            if(!exists(p)) {
                fprintf(stderr, "ERROR: '%s' does not exist!\n", dirName.c_str());
                return false;
            }
    
            // Confirm Directory
            if(!is_directory(p)) {
                return false;
            }
    
            conts->clear();
    
            // Store paths to sort later
            typedef vector<path> vec;
            vec v;
    
            copy(directory_iterator(p), directory_iterator(), back_inserter(v));
    
            sort(v.begin(), v.end()); 
    
            for(vec::const_iterator it(v.begin()), it_end(v.end()); it != it_end; ++it) {
                conts->push_back(it->string());
            }
    
    
        } catch(const filesystem_error& ex) {
            fprintf(stderr, "ERROR: '%s'!\n", ex.what());
            return false;
        }   
    
        return true;
    }
    

    解决方案

    Without more information to go on, I'd guess that what you're dealing with is a Schlemiel the Painter's algorithm: (Original) (Wikipedia). They're incredibly easy to fall into doing string processing. Let me give you an example.

    I want to read every line in a file, process each line somehow, run it through some intermediate processing. Then I want to gather up the the results, and maybe write it back to disk. Here's a way to do that. I make a single huge mistake that can be easy to miss:

    // proc.cpp
    class Foo
    {
      public:
      std::string chew_on(std::string const& line_to_chew_on) {...}
      ...
    };
    
    Foo processor;
    std::string buffer;
    
    // Read/process
    FILE *input=fopen(..., "r");
    char linebuffer[1000+1];
    for (char *line=fgets(linebuffer, 1000, input); line; 
         line=fgets(linebuffer, 1000, input) ) 
    {
        buffer=buffer+processor.chew_on(line);  //(1)
    }
    fclose(input);
    
    // Write
    FILE *output=fopen(...,"w");
    fwrite(buffer.data(), 1, buffer.size(), output);
    fclose(output);
    

    The problem here which can be easy to miss at first glance, is that each time line (1) is run, the entire contents of buffer is copied. If there are 1000 lines with 100 characters each, you end up spending time copying 100+200+300+400+....+100,000=5,050,000 byte copies to run this. Increase to 10,000 lines? 500,500,000. That paint can is getting further and further away.

    In this particular example, the fix is easy. Line (1) should read:

        buffer.append(processor.chew_on(line)); // (2)
    

    or equivalently: (thanks Matthieu M.):

        buffer += processor.chew_on(line);
    

    This manages to help because (usually) std::string won't need to make a full copy of buffer to perform the append function, whereas in (1), we're insisting that a copy be made.

    More generally, suppose (a) the processing you're doing keeps state, (b) you reference all or most of it often, and (c) that state grows over time. Then there's a fair to good chance that you've written a Θ(n2) time algorithm, which will exibit exactly the type of behavior you're talking about.


    Edit

    Of course, the stock answer to "why is my code slow?" is "run a profile." There are a number of tools and techniques for doing this. Some options include:

  • callgrind/kcachegrind (as suggested by David Schwartz)
  • Random Pausing (as suggested by Mike Dunlavey)
  • The GNU profiler, gprof
  • The GNU test coverage profiler, gcov
  • oprofile

    They've all got their strengths. "Random Pausing" is probably the simplest to implement, though it can be hard to interpret the results. 'gprof' and 'gcov' are basically useless on multithreaded programs. Callgrind is thorough but slow, and can sometimes play strange tricks on multithreaded programs. oprofile is fast, plays nicely with multithreaded programs, but can be difficult to use, and can miss things.

    However, if you're trying to profile a single threaded program, and are developing with the GNU toolchain, gprof can be a wonderful option. Take my proc.cpp, above. For purposes of demonstration, I'm going to profile an unoptimized run. First, I rebuild my program for profiling (adding -pg to the compile and link steps):

    $ g++ -O0 -g -pg -o proc.o -c proc.cpp
    $ g++ -pg -o proc proc.o
    

    I run the program once to create profiling information:

    ./proc
    

    In addition to doing whatever it would normally do, this run will create a file called 'gmon.out' in the current directory. Now, I run gprof to interpret the result:

    $ gprof ./proc
    Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
    100.50      0.01     0.01   234937     0.00     0.00  std::basic_string<...> std::operator+<...>(...)
      0.00      0.01     0.00   234937     0.00     0.00  Foo::chew_on(std::string const&)
      0.00      0.01     0.00        1     0.00    10.05  do_processing(std::string const&, std::string const&)
    ...
    

    Yes indeed, 100.5% of my program's time is spent in std::string operator+. Well, ok, up to some sampling error. (I'm running this in a VM ... it seems that the timing being captured by gprof is off. My program took much longer than 0.01 cumulative seconds to run...)

    For my very simple example, gcov is a little less instructive. But here's what it happens to show. First, compile and run for gcov:

    $ g++ -O0 -fprofile-arcs -ftest-coverage -o proc proc.cpp
    $ ./proc
    $ gcov ./proc
    ...
    

    This creates a bunch of files ending in .gcno, .gcda, .gcov in the current directory. The files in .gcov tell us how many times each line of code was executed during the run. So, in my example, my proc.cpp.gcov ends up looking like this:

            -:    0:Source:proc.cpp
            -:    0:Graph:proc.gcno
            -:    0:Data:proc.gcda
            -:    0:Runs:1
            -:    0:Programs:1
            -:    1:#include 
            -:    2:#include 
            -:    4:class Foo
            -:    5:{
            -:    6:  public:
       234937:    7:  std::string chew_on(std::string const& line_to_chew_on) {return line_to_chew_on;}
            -:    8:};
            -:    9:
            -:   10:
            -:   11:
            1:   12:int do_processing(std::string const& infile, std::string const& outfile)
            -:   13:{
            -:   14:  Foo processor;
            2:   15:  std::string buffer;
            -:   16:
            -:   17:  // Read/process
            1:   18:  FILE *input=fopen(infile.c_str(), "r");
            -:   19:  char linebuffer[1000+1];
       234938:   20:  for (char *line=fgets(linebuffer, 1000, input); line; 
            -:   21:       line=fgets(linebuffer, 1000, input) ) 
            -:   22:    {
       234937:   23:      buffer=buffer+processor.chew_on(line);  //(1)
            -:   24:    }
            1:   25:  fclose(input);
            -:   26:
            -:   27:  // Write
            1:   28:  FILE *output=fopen(outfile.c_str(),"w");
            1:   29:  fwrite(buffer.data(), 1, buffer.size(), output);
            1:   30:  fclose(output);
            1:   31:}
            -:   32:
            1:   33:int main()
            -:   34:{
            1:   35:  do_processing("/usr/share/dict/words","outfile");
            -:   36:}
    

    So from this, I'm going to have to conclude that the std::string::operator+ at line 23 (which is executed 234,937 times) is a potential cause of my program's slowness.

    As an aside, callgrind/kcachegrind work with multithreaded programs, and can provide much, much more information. For this program I run:

    g++ -O0 -o proc proc.cpp
    valgrind --tool=callgrind ./proc  # this takes forever to run
    kcachegrind callgrind.out.*
    

    And I find the following output, showing that what's really eating up my cycles is lots and lots of memory copies (99.4% of execution time spent in __memcpy_ssse3_back), which I can see all happen somewhere below line 23 in my source:

    这篇关于记忆/速度问题的一般策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆