无&QUOT大规模fprintf中速度差; -std = C99" [英] Massive fprintf speed difference without "-std=c99"

查看:257
本文介绍了无&QUOT大规模fprintf中速度差; -std = C99"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在挣扎了几个星期了性能较差的翻译我写的。
以下几个简单的bechmark

I had been struggling for weeks with a poor-performing translator I had written. On the following simple bechmark

#include<stdio.h>

int main()
{
    int x;
    char buf[2048];
    FILE *test = fopen("test.out", "wb");
    setvbuf(test, buf, _IOFBF, sizeof buf);
    for(x=0;x<1024*1024; x++)
        fprintf(test, "%04d", x);
    fclose(test);
    return 0
}

我们看到以下结果。

bash-3.1$ gcc -O2 -static test.c -o test
bash-3.1$ time ./test

real    0m0.334s
user    0m0.015s
sys     0m0.016s

正如你所看到的,-std = C99标志被添加在当下,性能来轰然倒下:

As you can see, the moment the "-std=c99" flag is added in, performance comes crashing down:

bash-3.1$ gcc -O2 -static -std=c99 test.c -o test
bash-3.1$ time ./test

real    0m2.477s
user    0m0.015s
sys     0m0.000s

我使用的编译器是gcc 4.6.2的mingw32。

The compiler I'm using is gcc 4.6.2 mingw32.

生成的文件为约12M,所以这是一个关于在两者之间21MB /秒之间的差。

The file generated is about 12M, so this is a difference between of about 21MB/s between the two.

运行差异显示生成的文件是相同的。

Running diff shows the the generated files are identical.

我以为这已经是与文件锁定在 fprintf中,其中的程序大量使用,但我一直没能找到一种方法来切换该关在C99版。

I assumed this has something to do with file locking in fprintf, of which the program makes heavy use, but I haven't been able to find a way to switch that off in the C99 version.

我试过 flockfile 对我在节目的一开始就使用流,以及相应的 funlockfile 在最后,但与有关隐式声明编译器错误打招呼,并自称这些功能未定义的引用链接错误。

I tried flockfile on the stream I use at the beginning of the program, and an corresponding funlockfile at the end, but was greeted with compiler errors about implicit declarations, and linker errors claiming undefined references to those functions.

莫不是这个问题的另一种解释,更重要的是,有没有办法使用C99在Windows无需支付如此巨大的性价比?

Could there be another explanation for this problem, and more importantly, is there any way to use C99 on windows without paying such an enormous performance price?


编辑:

看着这些期权所产生的code后,它看起来像在缓慢的版本,MinGW的坚持在以下几点:

After looking at the code generated by these options, it looks like in the slow versions, mingw sticks in the following:

_fprintf:
LFB0:
    .cfi_startproc
    subl    $28, %esp
    .cfi_def_cfa_offset 32
    leal    40(%esp), %eax
    movl    %eax, 8(%esp)
    movl    36(%esp), %eax
    movl    %eax, 4(%esp)
    movl    32(%esp), %eax
    movl    %eax, (%esp)
    call    ___mingw_vfprintf
    addl    $28, %esp
    .cfi_def_cfa_offset 4
    ret
    .cfi_endproc 

在快速版本,这根本不存在;否则,无论是完全一样的。我假设 __ mingw_vfprintf 似乎是慢性子在这里,但我不知道它需要模仿,使得它这么慢。

In the fast version, this simply does not exist; otherwise, both are exactly the same. I assume __mingw_vfprintf seems to be the slowpoke here, but I have no idea what behavior it needs to emulate that makes it so slow.

推荐答案

在源$ C ​​$ C一些挖后,我发现为什么MinGW的功能是如此可怕慢:

After some digging in the source code, I have found why the MinGW function is so terribly slow:

目前的开头 [V,F,S]的printf 在MinGW的,有一些无辜的前瞻性初始化code:

At the beginning of a [v,f,s]printf in MinGW, there is some innocent-looking initialization code:

__pformat_t stream = {
    dest,                   /* output goes to here        */
    flags &= PFORMAT_TO_FILE | PFORMAT_NOLIMIT, /* only these valid initially */
    PFORMAT_IGNORE,             /* no field width yet         */
    PFORMAT_IGNORE,             /* nor any precision spec     */
    PFORMAT_RPINIT,             /* radix point uninitialised  */
    (wchar_t)(0),               /* leave it unspecified       */
    0,                          /* zero output char count     */
    max,                        /* establish output limit     */
    PFORMAT_MINEXP          /* exponent chars preferred   */
};

然而, PFORMAT_MINEXP 不是什么似乎它是:

#ifdef _WIN32
# define PFORMAT_MINEXP    __pformat_exponent_digits() 
# ifndef _TWO_DIGIT_EXPONENT
#  define _get_output_format()  0 
#  define _TWO_DIGIT_EXPONENT   1
# endif
static __inline__ __attribute__((__always_inline__))
int __pformat_exponent_digits( void )
{
  char *exponent_digits = getenv( "PRINTF_EXPONENT_DIGITS" );
  return ((exponent_digits != NULL) && ((unsigned)(*exponent_digits - '0') < 3))
    || (_get_output_format() & _TWO_DIGIT_EXPONENT)
    ? 2
    : 3
    ;
}

这卷起获取调用我想打印每一次,而 GETENV 在Windows上一定不能很快。更换任何以 2 带来的运行时间回到它应该定义。

This winds up getting called every time I want to print, and getenv on windows must not be very quick. Replacing that define with a 2 brings the runtime back to where it should be.

所以,答案归结为:使用 -std = C99 或任何符合ANSI模式下,交换机的MinGW与自己的CRT运行时。通常情况下,这不会是一个问题,但MinGW的lib中有哪些放慢了格式化功能下降远远超出任何一个bug可想而知。

So, the answer comes down to this: when using -std=c99 or any ANSI-compliant mode, MinGW switches the CRT runtime with its own. Normally, this wouldn't be an issue, but the MinGW lib had a bug which slowed its formatting functions down far beyond anything imaginable.

这篇关于无&QUOT大规模fprintf中速度差; -std = C99&QUOT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆