为什么的glibc的sscanf的远远超过在Linux上的fscanf慢? [英] Why is glibc's sscanf vastly slower than fscanf on Linux?

查看:208
本文介绍了为什么的glibc的sscanf的远远超过在Linux上的fscanf慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用GCC 4.8,并在一个x86_64的glibc的2.19 Linux操作系统。

I am using GCC 4.8 and glibc 2.19 on an x86_64 Linux.

虽然与一个不同的问题,我比较的fscanf 的sscanf 。具体来说,我想要么使用的fscanf 直接标准输入:

While playing with different input methods for a different question, I compared fscanf and sscanf. Specifically, I would either use fscanf on the standard input directly:

char s[128]; int n;

while (fscanf(stdin, "%127s %d", s, &n) == 2) { }

或者,我会首先读取整个输入到一个缓冲区,然后遍历与的sscanf 缓冲区。 (阅读所有入缓冲区需要的时间极少量)。

Or I would first read the entire input into a buffer and then traverse the buffer with sscanf. (Reading everything into the buffer takes a tiny amount of time.)

char s[128]; int n;
char const * p = my_data;

for (int b; sscanf(p, "%127s %d%n", s, &n, &b) == 2; p += b) { }

要我惊讶的是,的fscanf 版本的极大地的速度更快。例如,处理seveal几万行与的fscanf 借此长:

To my surprise, the fscanf version is vastly faster. For example, processing seveal tens of thousands of lines with fscanf takes this long:

10000       0.003927487 seconds time elapsed
20000       0.006860206 seconds time elapsed
30000       0.007933329 seconds time elapsed
40000       0.012881912 seconds time elapsed
50000       0.013516816 seconds time elapsed
60000       0.015670432 seconds time elapsed
70000       0.017393129 seconds time elapsed
80000       0.019837480 seconds time elapsed
90000       0.023925753 seconds time elapsed

现在使用相同的的sscanf

10000       0.035864643 seconds time elapsed
20000       0.127150772 seconds time elapsed
30000       0.319828373 seconds time elapsed
40000       0.611551668 seconds time elapsed
50000       0.919187459 seconds time elapsed
60000       1.327831544 seconds time elapsed
70000       1.809843039 seconds time elapsed
80000       2.354809588 seconds time elapsed
90000       2.970678416 seconds time elapsed

我用的是谷歌PERF工具来衡量这一点。例如,对于50000线,的fscanf code需要50M左右的周期,而的sscanf $ C $时约为3300米周期。所以,我打破了以上调用点PERF纪录 / PERF的报告。随着的fscanf

I was using the Google perf tools to measure this. For example, for 50000 lines, the fscanf code requires about 50M cycles, and the sscanf code about 3300M cycles. So I broke down the top call sites with perf record/perf report. With fscanf:

 35.26%  xf  libc-2.19.so         [.] _IO_vfscanf
 23.91%  xf  [kernel.kallsyms]    [k] 0xffffffff8104f45a
  8.93%  xf  libc-2.19.so         [.] _int_malloc

和以的sscanf

 98.22%  xs  libc-2.19.so         [.] rawmemchr
  0.68%  xs  libc-2.19.so         [.] _IO_vfscanf
  0.38%  xs  [kernel.kallsyms]    [k] 0xffffffff8104f45a

所以,几乎所有的时间与的sscanf rawmemchr 是花!为什么是这样?如何能在的fscanf code避免这笔费用?

So almost all of the time with sscanf is spent in rawmemchr! Why is this? How can the fscanf code avoid this cost?

我试图寻找这一点,但我能想出的最好的这个讨论锁定的realloc的我不认为通话适用于这里。我还以为的fscanf 具有更好的内存位置(一遍又一遍用同样的缓冲液),但不能让这么大的差别。

I tried searching for this, but the best I could come up with is this discussion of locked realloc calls which I don't think applies here. I was also thinking that fscanf has better memory locality (using the same buffer over and over), but that can't make such a big difference.

有没有人有任何见解,在这个陌生的差异?

Does anyone have any insights in this strange discrepancy?

推荐答案

的sscanf()将你传送到一个​​字符串 _IO_FILE * 来使字符串看起来像一个文件。这是为了让相同的内部_IO_vfscanf()可同时用于一个串和一个FILE *

sscanf() converts the string you pass in to an _IO_FILE* to make the string look like a "file". This is so the same internal _IO_vfscanf() can be used for both a string and a FILE*.

不过,作为转换,在_IO_str_init_static_internal()函数来完成,它会调用 __ rawmemchr(PTR,'\\ 0')的一部分; 基本上是一个的strlen()调用,在你输入的字符串。这种转换是在每次调用做的sscanf(),由于您的输入缓冲区是相当大的,它会花费大量的时间相当数量计算输入字符串的长度。

However, as part of that conversion, done in a _IO_str_init_static_internal() function, it calls __rawmemchr (ptr, '\0'); essentially a strlen() call, on your input string. This conversion is done on every call to sscanf(), and since your input buffer is rather large, it'll spend a fair amount of time calculating the length of the input string.

使用fmemopen(),并使用fscanf()函数可能是另一种选择从输入字符串创建文件*。

Creating a FILE* from the input string using fmemopen() and use fscanf() could be another alternative.

这篇关于为什么的glibc的sscanf的远远超过在Linux上的fscanf慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆