为什么从stdin中读取行比C ++慢得多? [英] Why is reading lines from stdin much slower in C++ than Python?

查看:684
本文介绍了为什么从stdin中读取行比C ++慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较使用Python和C ++从stdin输入的字符串的行数,并且震惊地看到我的C ++代码比等效的Python代码运行的数量级。由于我的C ++是生锈的,我还不是一个专家Pythonista,请告诉我,如果我做错了,或者如果我误解了一些东西。






(tl; dr answer:include语句:cin.sync_with_stdio(false) p>

tl; dr结果:滚动到我的问题的底部,看看表。)






C ++代码:

  #include< iostream> 
#include< time.h>

using namespace std;

int main(){
string input_line;
long line_count = 0;
time_t start = time(NULL);
int sec;
int lps;

while(cin){
getline(cin,input_line);
if(!cin.eof())
line_count ++;
};

sec =(int)time(NULL) - start;
cerr<< 读取<< line_count<< lines in<< sec << 秒。 ;
if(sec> 0){
lps = line_count / sec;
cerr<< LPS:<< lps< endl;
} else
cerr<< endl;
return 0;
}

//编译:
// g ++ -O3 -o readline_test_cpp foo.cpp

Python等效:

 #!/ usr / bin / env python 
import time
import sys

count = 0
start = time.time()

sys.stdin:
count + = 1

delta_sec = int(time.time() - start_time)
如果delta_sec> = 0:
lines_per_sec = int (round(count / delta_sec))
print(Read {0} lines in {1} seconds。LPS:{2}。format(count,delta_sec,
lines_per_sec))


以下是我的结果:

  $ cat test_lines | ./readline_test_cpp 
在9秒内读取5570000行。 LPS:618889

$ cat test_lines | ./readline_test.py
在1秒内读取5570000行。 LPS:5570000

编辑:这在OS-X(10.6.8)和Linux 2.6.32(RHEL 6.2)下。前者是一个macbook pro,后者是一个非常强大的服务器,而不是这太相关。



编辑2:

  $ for i in {1 .. 5}; do echoTest run $ i at`date`; echo -nCPP:; cat test_lines | ./readline_test_cpp; echo -nPython:; cat test_lines | ./readline_test.py; done 
测试运行1在Mon Feb 20 21:29:28 EST 2012
CPP:在9秒内读取5570001行。 LPS:618889
Python:在1秒内读取5570000行。 LPS:5570000
测试2在星期二20 20 21:29:39 EST 2012
CPP:在9秒内读取5570001行。 LPS:618889
Python:在1秒内读取5570000行。 LPS:5570000
测试运行3在星期二20 20 21:29:50 EST 2012
CPP:在9秒内读取5570001行。 LPS:618889
Python:在1秒内读取5570000行。 LPS:5570000
测试4在星期二20 20 21:30:01 EST 2012
CPP:在9秒内读取5570001行。 LPS:618889
Python:在1秒内读取5570000行。 LPS:5570000
在2月20日测试运行5 21:30:11 EST 2012
CPP:在10秒内读取5570001行。 LPS:557000
Python:在1秒内读取5570000行。 LPS:5570000

编辑3:

好吧,我试着JN的建议,试图让python存储行读取:但它没有什么区别,python的速度。



我也试过J.N.的建议,使用scanf到char数组,而不是getline到std :: string。答对了!这导致python和c ++的性能相当。 (3,333,333 LPS与我的输入数据,顺便说一句,它们只是每行三个字段的短线,通常约20个字符,虽然有时更多)。



/ p>

  char input_a [512]; 
char input_b [32];
char input_c [512];
while(scanf(%s%s%s\\\
,input_a,input_b,input_c)!= EOF){
line_count ++;
};

速度:

 code> $ cat test_lines | ./readline_test_cpp2 
在3秒内读取10000000行。 LPS:3333333
$ cat test_lines | ./readline_test2.py
在3秒内读取10000000行。 LPS:3333333

(是的,我跑了好几次。)所以, scanf而不是getline。但是,我仍然好奇,如果人们认为这种性能命中从std :: string / getline是典型和合理的。



编辑4(是:最终编辑/解决方案)




cin.sync_with_stdio(false);



在上面我的原始while循环上面的结果是运行得比Python快的代码。



新性能比较(这是在我的2011年Macbook Pro上),使用原始代码,禁用同步的原稿, python,分别,在一个文件有20M行的文本。是的,我运行它几次,以消除磁盘缓存混淆。

  $ / usr / bin / time cat test_lines_double | ./readline_test_cpp 
33.30 real 0.04用户0.74 sys
在33秒内读取20000001行。 LPS:606060
$ / usr / bin / time cat test_lines_double | ./readline_test_cpp1b
3.79 real 0.01用户0.50 sys
在4秒内读取20000000行。 LPS:5000000
$ / usr / bin / time cat test_linesdouble | ./readline_test.py
6.88 real 0.01 user 0.38 sys
在6秒内读取20000000行。 LPS:3333333

感谢@Vaughn Cato的回答! 任何精心设计的人都可以提出或好的参考,人们可以指出为什么这种同步发生,意味着什么,有用的时候,什么时候可以禁用将非常感谢后代。 : - )



编辑5 /更好的解决方案



正如Gandalf Gray在下面提到的,gets甚至比scanf或者不同步cin方法更快。我还了解到, scanf gets 都是UNSAFE,并且由于缓冲区溢出的可能性而不应使用。所以,我写了这个迭代使用fgets,更安全的替代get。这里是我的同伴的相关行:

  char input_line [MAX_LINE]; 
char * result;

//< snip>

while((result = fgets(input_line,MAX_LINE,stdin))!= NULL)
line_count ++;
if(ferror(stdin))
perror(Error reading stdin。);

现在,下面是使用更大文件(100M行;〜3.4GB)快速服务器与非常快的磁盘,比较python,未同步cin和fgets方法,以及与wc实用程序进行比较。 [scanf版本segfaulted,我不想解决它。]:

  $ / usr / bin / time cat temp_big_file | readline_test.py 
0.03user 2.04system 0:28.06elapsed 7%CPU(0avgtext + 0avgdata 2464maxresident)k
0inputs + 0outputs(0major + 182minor)pagefaults 0swaps
在28秒内读取100000000行。 LPS:3571428

$ / usr / bin / time cat temp_big_file | readline_test_unsync_cin
0.03user 1.64system 0:08.10elapsed 20%CPU(0avgtext + 0avgdata 2464maxresident)k
0inputs + 0outputs(0major + 182minor)pagefaults 0swaps
在8秒内读取100000000行。 LPS:12500000

$ / usr / bin / time cat temp_big_file | readline_test_fgets
0.00user 0.93system 0:07.01elapsed 13%CPU(0avgtext + 0avgdata 2448maxresident)k
0inputs + 0outputs(0major + 181minor)pagefaults 0swaps
在7秒内读取100000000行。 LPS:14285714

$ / usr / bin / time cat temp_big_file | wc -l
0.01user 1.34system 0:01.83elapsed 74%CPU(0avgtext + 0avgdata 2464maxresident)k
0inputs + 0outputs(0major + 182minor)pagefaults 0swaps
100000000


Recap(lines per second):
python:3,571,428
cin(no sync):12,500,000
fgets:14,285,714
wc:54,644,808

正如你所看到的,fgets是更好的,但是仍然远离wc性能;我很确定这是由于wc检查每个字符没有任何内存复制的事实。我怀疑,在这一点上,代码的其他部分将成为瓶颈,所以我不认为优化到该级甚至是值得的,即使可能(因为,毕竟,我实际上需要存储读线在记忆中)。



另外请注意,使用char * buffer和fgets vs unsynced cin to string的小权衡是后者可以读取任何长度的行,而前者需要限制输入一些有限数。在实践中,对于读取大多数基于行的输入文件来说,这可能是一个非问题,因为缓冲区可以设置为一个非常大的值,不会被有效输入超过。



这是有教育意义的。非常感谢您的意见和建议。



编辑6:



在下面的注释中由JF Sebastian提出,GNU wc实用程序使用简单的C read()(在safe-read.c包装器内)一次读取块(每个16k字节)并计数新行。这里是基于JF的代码的python等价物(只显示相关的代码片段代替了python for循环:

  BUFFER_SIZE = 16384 
count = sum(chunk.count('\\\
')for chunk in iter(partial(sys.stdin.read,BUFFER_SIZE),''))
/ pre>

这个版本的性能是相当快的(虽然仍然比原始c wc实用程序慢一点:

  $ / usr / bin / time cat temp_big_file | readline_test3.py 
0.01user 1.16system 0:04.74elapsed 24%CPU(0avgtext + 0avgdata 2448maxresident)k
0inputs + 0outputs(0major + 181minor)pagefaults 0swaps
在4.7275秒内读取100000000行LPS:21152829

同样,我有点傻,比较C ++ fgets / cin和第一个python代码一方面wc -l和另一个最后一个python片段,因为后两个不实际上存储读取行,但只计算换行符。仍然,有趣的是探索所有不同的实现,并考虑性能影响。再次感谢!



编辑7:微小基准附录和摘要



,我想我会更新与同一个文件的原始(同步)C ++代码的同一个文件的读取速度。同样,这是一个快速磁盘上的100M线文件。现在是完整的表:

 实现行每秒
python(默认)3,571,428
cin / naive)819,672
cin(不同步)12,500,000
fgets 14,285,714
wc(不公平比较)54,644,808


解决方案

默认情况下, cin 与stdio同步,这会导致它避免任何输入缓冲。如果您将它添加到主体顶部,您应该会看到更好的性能:

  std :: ios_base :: sync_with_stdio假); 

通常,当缓冲输入流时,而不是一次读取一个字符,以较大的块读取。这减少了通常相对昂贵的系统调用的数量。但是,由于 FILE * 基于 stdio iostreams 有单独的实现,因此单独的缓冲区,如果两者一起使用,这可能会导致一个问题。例如:

  int myvalue1; 
cin>> myvalue1;
int myvalue2;
scanf(%d,& myvalue2);

如果 cin 它实际上需要,然后第二个整数值将不能用于 scanf 函数,它有自己的独立缓冲区。这会导致意外的结果。



为了避免这种情况,默认情况下,流与 stdio 同步。实现这一点的一个常见方法是使用 stdio 函数,每次使用 cin 读取一个字符。不幸的是,这引入了很多开销。对于少量的输入,这不是一个大问题,但是当你读取数百万行时,性能损失是很大的。



幸运的是,如果您知道您在做什么,您也应该禁用此功能以提高性能,因此他们提供了 sync_with_stdio 方法。


I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.


(tl;dr answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.

tl;dr results: scroll all the way down to the bottom of my question and look at the table.)


C++ code:

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;                                                                   

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

//Compiled with:
//g++ -O3 -o readline_test_cpp foo.cpp

Python Equivalent:

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

Here are my results:

$ cat test_lines | ./readline_test_cpp 
Read 5570000 lines in 9 seconds. LPS: 618889

$cat test_lines | ./readline_test.py 
Read 5570000 lines in 1 seconds. LPS: 5570000

Edit: I should note that I tried this both under OS-X (10.6.8) and Linux 2.6.32 (RHEL 6.2). The former is a macbook pro, the latter is a very beefy server, not that this is too pertinent.

Edit 2: (Removed this edit, as no longer applicable)

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000

Edit 3:

Okay, I tried J.N.'s suggestion of trying having python store the line read: but it made no difference to python's speed.

I also tried J.N.'s suggestion of using scanf into a char array instead of getline into a std::string. Bingo! This resulted in equivalent performance for both python and c++. (3,333,333 LPS with my input data, which by the way are just short lines of three fields each, usually about 20 chars wide, though sometimes more).

Code:

char input_a[512];
char input_b[32];
char input_c[512];
while(scanf("%s %s %s\n", input_a, input_b, input_c) != EOF) {             
    line_count++;
};

Speed:

$ cat test_lines | ./readline_test_cpp2 
Read 10000000 lines in 3 seconds. LPS: 3333333
$ cat test_lines | ./readline_test2.py 
Read 10000000 lines in 3 seconds. LPS: 3333333

(Yes, I ran it several times.) So, I guess I will now use scanf instead of getline. But, I'm still curious if people think this performance hit from std::string/getline is typical and reasonable.

Edit 4 (was: Final Edit / Solution):

Adding: cin.sync_with_stdio(false);

Immediately above my original while loop above results in code that runs faster than Python.

New performance comparison (this is on my 2011 Macbook Pro), using the original code, the original with the sync disabled, and the original python, respectively, on a file with 20M lines of text. Yes, I ran it several times to eliminate disk caching confound.

$ /usr/bin/time cat test_lines_double | ./readline_test_cpp
       33.30 real         0.04 user         0.74 sys
Read 20000001 lines in 33 seconds. LPS: 606060
$ /usr/bin/time cat test_lines_double | ./readline_test_cpp1b
        3.79 real         0.01 user         0.50 sys
Read 20000000 lines in 4 seconds. LPS: 5000000
$ /usr/bin/time cat test_lines_double | ./readline_test.py 
        6.88 real         0.01 user         0.38 sys
Read 20000000 lines in 6 seconds. LPS: 3333333

Thanks to @Vaughn Cato for his answer! Any elaboration people can make or good references people can point to as to why this sync happens, what it means, when it's useful, and when it's okay to disable would be greatly appreciated by posterity. :-)

Edit 5 / Better Solution:

As suggested by Gandalf The Gray below, gets is even faster than scanf or the unsynchronized cin approach. I also learned that scanf and gets are both UNSAFE and should NOT BE USED due to potential of buffer overflow. So, I wrote this iteration using fgets, the safer alternative to gets. Here are the pertinent lines for my fellow noobs:

char input_line[MAX_LINE];
char *result;

//<snip>

while((result = fgets(input_line, MAX_LINE, stdin )) != NULL)    
    line_count++;
if (ferror(stdin))
    perror("Error reading stdin.");

Now, here are the results using an even larger file (100M lines; ~3.4GB) on a fast server with very fast disk, comparing the python, the unsynced cin, and the fgets approaches, as well as comparing with the wc utility. [The scanf version segfaulted and I don't feel like troubleshooting it.]:

$ /usr/bin/time cat temp_big_file | readline_test.py 
0.03user 2.04system 0:28.06elapsed 7%CPU (0avgtext+0avgdata 2464maxresident)k
0inputs+0outputs (0major+182minor)pagefaults 0swaps
Read 100000000 lines in 28 seconds. LPS: 3571428

$ /usr/bin/time cat temp_big_file | readline_test_unsync_cin 
0.03user 1.64system 0:08.10elapsed 20%CPU (0avgtext+0avgdata 2464maxresident)k
0inputs+0outputs (0major+182minor)pagefaults 0swaps
Read 100000000 lines in 8 seconds. LPS: 12500000

$ /usr/bin/time cat temp_big_file | readline_test_fgets 
0.00user 0.93system 0:07.01elapsed 13%CPU (0avgtext+0avgdata 2448maxresident)k
0inputs+0outputs (0major+181minor)pagefaults 0swaps
Read 100000000 lines in 7 seconds. LPS: 14285714

$ /usr/bin/time cat temp_big_file | wc -l
0.01user 1.34system 0:01.83elapsed 74%CPU (0avgtext+0avgdata 2464maxresident)k
0inputs+0outputs (0major+182minor)pagefaults 0swaps
100000000


Recap (lines per second):
python:         3,571,428 
cin (no sync): 12,500,000
fgets:         14,285,714
wc:            54,644,808

As you can see, fgets is better but still pretty far from wc performance; I'm pretty sure this is due to the fact that wc examines each character without any memory copying. I suspect that, at this point, other parts of the code will become the bottleneck, so I don't think optimizing to that level would even be worthwhile, even if possible (since, after all, I actually need to store the read lines in memory).

Also note that a small tradeoff with using a char * buffer and fgets vs unsynced cin to string is that the latter can read lines of any length, while the former requires limiting input to some finite number. In practice, this is probably a non-issue for reading most line-based input files, as the buffer can be set to a very large value that would not be exceeded by valid input.

This has been educational. Thanks to all for your comments and suggestions.

Edit 6:

As suggested by J.F. Sebastian in the comments below, the GNU wc utility uses plain C read() (within the safe-read.c wrapper) to read chunks (of 16k bytes) at a time and count new lines. Here's a python equivalent based on J.F.'s code (just showing the relevant snippet that replaces the python for loop:

BUFFER_SIZE = 16384 
count = sum(chunk.count('\n') for chunk in iter(partial(sys.stdin.read, BUFFER_SIZE), ''))

The performance of this version is quite fast (though still a bit slower than the raw c wc utility, of course:

$ /usr/bin/time cat temp_big_file | readline_test3.py 
0.01user 1.16system 0:04.74elapsed 24%CPU (0avgtext+0avgdata 2448maxresident)k
0inputs+0outputs (0major+181minor)pagefaults 0swaps
Read 100000000 lines in 4.7275 seconds. LPS: 21152829

Again, it's a bit silly for me to compare C++ fgets/cin and the first python code on the one hand to wc -l and this last python snippet on the other, as the latter two don't actually store the read lines but merely count newlines. Still, it's interesting to explore all the different implementations and think about the performance implications. Thanks again!

Edit 7: Tiny benchmark addendum and recap

For completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the complete table now:

Implementation      Lines per second
python (default)           3,571,428
cin (default/naive)          819,672
cin (no sync)             12,500,000
fgets                     14,285,714
wc (not fair comparison)  54,644,808

解决方案

By default, cin is synchronized with stdio, which causes it to avoid any input buffering. If you add this to the top of your main, you should see much better performance:

std::ios_base::sync_with_stdio(false);

Normally, when an input stream is buffered, instead of reading one character at a time, the stream will be read in larger chunks. This reduces the number of system calls, which are typically relatively expensive. However, since the FILE* based stdio and iostreams often have separate implementations and therefore separate buffers, this could lead to a problem if both were used together. For example:

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

If more input was read by cin than it actually needed, then the second integer value wouldn't be available for the scanf function, which has its own independent buffer. This would lead to unexpected results.

To avoid this, by default, streams are synchronized with stdio. One common way to achieve this is to have cin read each character one at a time as needed using stdio functions. Unfortunately, this introduces a lot of overhead. For small amounts of input, this isn't a big problem, but when you are reading millions of lines, the performance penalty is significant.

Fortunately, the library designers decided that you should also be able to disable this feature to get improved performance if you knew what you were doing, so they provided the sync_with_stdio method.

这篇关于为什么从stdin中读取行比C ++慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆