为什么在打印单个数字时snprintf始终比ostringstream快2倍? [英] Why is snprintf consistently 2x faster than ostringstream for printing a single number?

查看:75
本文介绍了为什么在打印单个数字时snprintf始终比ostringstream快2倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在测试各种方法来格式化C ++中的double,这是我想到的一些代码:

#include <chrono>
#include <cstdio>
#include <random>
#include <vector>
#include <sstream>
#include <iostream>

inline long double currentTime()
{
    const auto now = std::chrono::steady_clock::now().time_since_epoch();
    return std::chrono::duration<long double>(now).count();
}

int main()
{
    std::mt19937 mt(std::random_device{}());
    std::normal_distribution<long double> dist(0, 1e280);
    static const auto rng=[&](){return dist(mt);};
    std::vector<double> numbers;
    for(int i=0;i<10000;++i)
        numbers.emplace_back(rng());

    const int precMax=200;
    const int precStep=10;

    char buf[10000];
    std::cout << "snprintf\n";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        const auto t0=currentTime();
        for(const auto num : numbers)
            std::snprintf(buf, sizeof buf, "%.*e", precision, num);
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s\n";
    }

    std::cout << "ostringstream\n";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        std::ostringstream ss;
        ss.precision(precision);
        ss << std::scientific;
        const auto t0=currentTime();
        for(const auto num : numbers)
        {
            ss.str("");
            ss << num;
        }
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s\n";
    }
}

让我感到奇怪的是,起初,当精度小于40时,我或多或少获得了相同的性能.但是,差异就转移到2.1x上,而对snprintf有利.参见我在Core i7-4765T,Linux 32位,g ++ 5.5.0,libc 2.14.1上的输出,该文件使用-march=native -O3编译:

snprintf
Precision 10: 0.0262963 s
Precision 20: 0.035437 s
Precision 30: 0.0468597 s
Precision 40: 0.0584917 s
Precision 50: 0.0699653 s
Precision 60: 0.081446 s
Precision 70: 0.0925062 s
Precision 80: 0.104068 s
Precision 90: 0.115419 s
Precision 100: 0.128886 s
Precision 110: 0.138073 s
Precision 120: 0.149591 s
Precision 130: 0.161005 s
Precision 140: 0.17254 s
Precision 150: 0.184622 s
Precision 160: 0.195268 s
Precision 170: 0.206673 s
Precision 180: 0.218756 s
Precision 190: 0.230428 s
Precision 200: 0.241654 s
ostringstream
Precision 10: 0.0269695 s
Precision 20: 0.0383902 s
Precision 30: 0.0497328 s
Precision 40: 0.12028 s
Precision 50: 0.143746 s
Precision 60: 0.167633 s
Precision 70: 0.190878 s
Precision 80: 0.214735 s
Precision 90: 0.238105 s
Precision 100: 0.261641 s
Precision 110: 0.285149 s
Precision 120: 0.309025 s
Precision 130: 0.332283 s
Precision 140: 0.355797 s
Precision 150: 0.379415 s
Precision 160: 0.403452 s
Precision 170: 0.427337 s
Precision 180: 0.450668 s
Precision 190: 0.474012 s
Precision 200: 0.498061 s

所以我的主要问题是:这种双重差异的原因是什么?另外,如何使ostringstream的性能更接近snprintf?

注意:另一个问题,为什么snprintf比ostringstream更快?或者,,与我的不同.首先,那里没有具体的答案,为什么以不同的精度格式化单个数字的速度较慢.其次,这个问题问为什么总的来说要慢一些",它太宽泛了以至于无法回答我的问题,而这个问题则询问了一种格式化单个double数字的特定情况.

解决方案

std::ostringstream调用vsnprintf两次:第一次尝试使用较小的缓冲区,第二次尝试使用正确大小的缓冲区.参见1011行附近的locale_facets.tcc(此处std::__convert_from_vvsnprintf的代理):

#if _GLIBCXX_USE_C99_STDIO
    // Precision is always used except for hexfloat format.
    const bool __use_prec =
      (__io.flags() & ios_base::floatfield) != ios_base::floatfield;

    // First try a buffer perhaps big enough (most probably sufficient
    // for non-ios_base::fixed outputs)
    int __cs_size = __max_digits * 3;
    char* __cs = static_cast<char*>(__builtin_alloca(__cs_size));
    if (__use_prec)
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __prec, __v);
    else
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __v);

    // If the buffer was not large enough, try again with the correct size.
    if (__len >= __cs_size)
      {
        __cs_size = __len + 1;
        __cs = static_cast<char*>(__builtin_alloca(__cs_size));
        if (__use_prec)
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __prec, __v);
        else
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __v);
      }

这与以下观察结果完全吻合:对于较小的请求精度性能,它与snprintf相同,而对于较大的精度,性能要差2倍.

此外,由于使用的缓冲区不依赖于std::ostringstream缓冲区的任何属性,仅依赖于定义为__gnu_cxx::__numeric_traits<_ValueT>::__digits10__max_digits,因此除此以外,似乎没有任何自然的解决方法修复libstdc++本身.

我已经报告了作为libstdc ++的错误. /p>

I was testing various approaches at formatting doubles in C++, and here's some code I came up with:

#include <chrono>
#include <cstdio>
#include <random>
#include <vector>
#include <sstream>
#include <iostream>

inline long double currentTime()
{
    const auto now = std::chrono::steady_clock::now().time_since_epoch();
    return std::chrono::duration<long double>(now).count();
}

int main()
{
    std::mt19937 mt(std::random_device{}());
    std::normal_distribution<long double> dist(0, 1e280);
    static const auto rng=[&](){return dist(mt);};
    std::vector<double> numbers;
    for(int i=0;i<10000;++i)
        numbers.emplace_back(rng());

    const int precMax=200;
    const int precStep=10;

    char buf[10000];
    std::cout << "snprintf\n";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        const auto t0=currentTime();
        for(const auto num : numbers)
            std::snprintf(buf, sizeof buf, "%.*e", precision, num);
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s\n";
    }

    std::cout << "ostringstream\n";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        std::ostringstream ss;
        ss.precision(precision);
        ss << std::scientific;
        const auto t0=currentTime();
        for(const auto num : numbers)
        {
            ss.str("");
            ss << num;
        }
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s\n";
    }
}

What makes me wonder is that at first, when precision is less than 40, I get more or less the same performance. But then the difference goes to 2.1x in favor of snprintf. See my output on Core i7-4765T, Linux 32-bit, g++ 5.5.0, libc 2.14.1, compiled with -march=native -O3:

snprintf
Precision 10: 0.0262963 s
Precision 20: 0.035437 s
Precision 30: 0.0468597 s
Precision 40: 0.0584917 s
Precision 50: 0.0699653 s
Precision 60: 0.081446 s
Precision 70: 0.0925062 s
Precision 80: 0.104068 s
Precision 90: 0.115419 s
Precision 100: 0.128886 s
Precision 110: 0.138073 s
Precision 120: 0.149591 s
Precision 130: 0.161005 s
Precision 140: 0.17254 s
Precision 150: 0.184622 s
Precision 160: 0.195268 s
Precision 170: 0.206673 s
Precision 180: 0.218756 s
Precision 190: 0.230428 s
Precision 200: 0.241654 s
ostringstream
Precision 10: 0.0269695 s
Precision 20: 0.0383902 s
Precision 30: 0.0497328 s
Precision 40: 0.12028 s
Precision 50: 0.143746 s
Precision 60: 0.167633 s
Precision 70: 0.190878 s
Precision 80: 0.214735 s
Precision 90: 0.238105 s
Precision 100: 0.261641 s
Precision 110: 0.285149 s
Precision 120: 0.309025 s
Precision 130: 0.332283 s
Precision 140: 0.355797 s
Precision 150: 0.379415 s
Precision 160: 0.403452 s
Precision 170: 0.427337 s
Precision 180: 0.450668 s
Precision 190: 0.474012 s
Precision 200: 0.498061 s

So my main question is: what is the reason for this twofold difference? And additionally, how can I make ostringstream's performance closer to that of snprintf?

NOTE: another question, Why is snprintf faster than ostringstream or is it?, is different from mine. First, there's no specific answer there, why formatting of a single number in different precisions is slower. Second, that question asks "why it's slower in general", which is too broad to be useful to answer my question, while this one asks about one specific scenario of formatting single double number.

解决方案

std::ostringstream calls vsnprintf twice: first time to try with a small buffer, and the second one with the correctly-sized buffer. See locale_facets.tcc around line 1011 (here std::__convert_from_v is a proxy for vsnprintf):

#if _GLIBCXX_USE_C99_STDIO
    // Precision is always used except for hexfloat format.
    const bool __use_prec =
      (__io.flags() & ios_base::floatfield) != ios_base::floatfield;

    // First try a buffer perhaps big enough (most probably sufficient
    // for non-ios_base::fixed outputs)
    int __cs_size = __max_digits * 3;
    char* __cs = static_cast<char*>(__builtin_alloca(__cs_size));
    if (__use_prec)
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __prec, __v);
    else
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __v);

    // If the buffer was not large enough, try again with the correct size.
    if (__len >= __cs_size)
      {
        __cs_size = __len + 1;
        __cs = static_cast<char*>(__builtin_alloca(__cs_size));
        if (__use_prec)
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __prec, __v);
        else
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __v);
      }

This exactly matches the observation that for small requested precision performance is the same as that of snprintf, while for larger precision it's 2x poorer.

Moreover, since the buffer used doesn't depend on any properties of std::ostringstream buffer, only on __max_digits, which is defined as __gnu_cxx::__numeric_traits<_ValueT>::__digits10, there doesn't seem to be any natural fix for this other than fixing libstdc++ itself.

I've reported it as bug to libstdc++.

这篇关于为什么在打印单个数字时snprintf始终比ostringstream快2倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆