双到字符串没有科学记数法或尾随零,有效 [英] double to string without scientific notation or trailing zeros, efficiently

查看:221
本文介绍了双到字符串没有科学记数法或尾随零,有效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个例程被称为zillion次,以创建充满数字的大型csv文件。有更有效的方法吗?

  static std :: string dbl2str(double d)
{
std :: stringstream ss;
ss<< std :: fixed<< std :: setprecision(10)<< d; // convert double to string w fixed notation,hi precision
std :: string s = ss.str(); // output to std :: string
s.erase(s.find_last_not_of('0')+ 1,std :: string :: npos); // remove trailing 000s(123.1200 => 123.12,123.000 => 123.)
return(s [s.size() - 1] =='。')? s.substr(0,s.size() - 1):s; // remove dangling decimal(123. => 123)
}


解决方案

开始之前,请检查此功能是否花费了大量时间。通过测量,用分析器或其他方法做到这一点。知道你称之为百万倍的时间是非常好的,但如果事实证明你的程序仍然只花费1%的时间在这个功能,那么你在这里做的任何事情都不可能提高你的程序的性能超过1%。如果是这样的话,你的问题的答案将是为了你的目的,这个功能不能做得明显更有效,你浪费你的时间,如果你尝试。



<第一件事,避免 s.substr(0,s.size() - 1)。这会复制大部分的字符串,这使得你的函数不符合NRVO,所以我想一般来说,你会得到一个副本。所以我做的第一个更改是替换最后一行:

  if(s [s.size 1] =='。'){
s.erase(s.end() - 1);
}
return s;

但是如果性能是一个严重的问题,那么这里我该怎么做。我不承诺这是最快的可能,但它避免了不必要的分配和复制的一些问题。任何涉及 stringstream 的方法都需要从stringstream到结果的副本,所以我们想要一个更低级的操作, snprintf

  static std :: string dbl2str(double d)
{
size_t len = std :: snprintf(0,0,%.10f,d);
std :: string s(len + 1,0);
//技术上不可移植,见下面
std :: snprintf(& s [0],len + 1,%.10f,d);
// remove nul terminator
s.pop_back();
//删除尾部零
s.erase(s.find_last_not_of('0')+ 1,std :: string :: npos);
//删除结尾点
if(s.back()=='。'){
s.pop_back();
}
return s;
}

第二次调用 snprintf 假设 std :: string 使用连续存储。这在C ++ 11中得到保证。它不能保证在C ++ 03中,但对于C ++委员会已知的 std :: string 的所有主动维护的实现都是如此。如果性能真的很重要,那么我认为这是非可移植的假设是合理的,因为直接写入字符串保存到字符串后复制到字符串。



s.pop_back()是C ++ 11的说法 s.erase(s.end() - 1) s.back() s [s.size() - 1]



对于另一个可能的改进,您可以摆脱第一次调用 snprintf ,而改为 s 到某些值,例如 std :: numeric_limits< double> :: max_exponent10 + 14 (基本上, -DBL_MAX 需要)。麻烦的是,这分配和零比通常需要的内存多得多(对于IEEE双字节,322字节)。我的直觉是,这将慢于第一次调用 snprintf ,更不用说浪费内存的情况下,字符串返回值保持挂起一段时间调用者。



或者, std :: max((int)std :: log10(d),0)+ 14 计算所需大小的合理紧密上限,可能比 snprintf 可以更快地计算。



最后,可能是通过更改函数接口来提高性能。例如,不是返回一个新的字符串,你可以附加到调用者传递的字符串:

  void append_dbl2str :: string& s,double d){
size_t len = std :: snprintf(0,0,%.10f,d);
size_t oldsize = s.size();
s.resize(oldsize + len + 1);
//技术上不可移植
std :: snprintf(& s [oldsize],len + 1,%.10f,d);
// remove nul terminator
s.pop_back();
//删除尾部零
s.erase(s.find_last_not_of('0')+ 1,std :: string :: npos);
//删除结尾点
if(s.back()=='。'){
s.pop_back();
}
}

然后调用者可以 ()足够的空间,调用你的函数几次(可能有其他字符串附加在之间),并将结果数据块写入文件,同时,没有任何内存分配,除了 reserve 。 Plenty不一定是整个文件,它可以是一行或段落,但任何避免zillion内存分配的是潜在的性能提升。


This routine is called a zillion times to create large csv files full of numbers. Is there a more efficient way to to this?

    static std::string dbl2str(double d)
    {
        std::stringstream ss;
        ss << std::fixed << std::setprecision(10) << d;              //convert double to string w fixed notation, hi precision
        std::string s = ss.str();                                    //output to std::string
        s.erase(s.find_last_not_of('0') + 1, std::string::npos);     //remove trailing 000s    (123.1200 => 123.12,  123.000 => 123.)
        return (s[s.size()-1] == '.') ? s.substr(0, s.size()-1) : s; //remove dangling decimal (123. => 123)
    }

解决方案

Before you start, check whether significant time is spent in this function. Do this by measuring, either with a profiler or otherwise. Knowing that you call it a zillion times is all very well, but if it turns out your program still only spends 1% of its time in this function, then nothing you do here can possibly improve your program's performance by more than 1%. If that were the case the answer to your question would be "for your purposes no, this function cannot be made significantly more efficient and you are wasting your time if you try".

First thing, avoid s.substr(0, s.size()-1). This copies most of the string and it makes your function ineligible for NRVO, so I think generally you'll get a copy on return. So the first change I'd make is to replace the last line with:

if(s[s.size()-1] == '.') {
    s.erase(s.end()-1);
}
return s;

But if performance is a serious concern, then here's how I'd do it. I'm not promising that this is the fastest possible, but it avoids some issues with unnecessary allocations and copying. Any approach involving stringstream is going to require a copy from the stringstream to the result, so we want a more low-level operation, snprintf.

static std::string dbl2str(double d)
{
    size_t len = std::snprintf(0, 0, "%.10f", d);
    std::string s(len+1, 0);
    // technically non-portable, see below
    std::snprintf(&s[0], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
    return s;
}

The second call to snprintf assumes that std::string uses contiguous storage. This is guaranteed in C++11. It is not guaranteed in C++03, but is true for all actively-maintained implementations of std::string known to the C++ committee. If performance really is important then I think it's reasonable to make that non-portable assumption, since writing directly into a string saves copying into a string later.

s.pop_back() is the C++11 way of saying s.erase(s.end()-1), and s.back() is s[s.size()-1]

For another possible improvement, you could get rid of the first call to snprintf and instead size your s to some value like std::numeric_limits<double>::max_exponent10 + 14 (basically, the length that -DBL_MAX needs). The trouble is that this allocates and zeros far more memory than is typically needed (322 bytes for an IEEE double). My intuition is that this will be slower than the first call to snprintf, not to mention wasteful of memory in the case where the string return value is kept hanging around for a while by the caller. But you can always test it.

Alternatively, std::max((int)std::log10(d), 0) + 14 computes a reasonably tight upper bound on the size needed, and might be quicker than snprintf can compute it exactly.

Finally, it may be that you can improve performance by changing the function interface. For example, instead of returning a new string you could perhaps append to a string passed in by the caller:

void append_dbl2str(std::string &s, double d) {
    size_t len = std::snprintf(0, 0, "%.10f", d);
    size_t oldsize = s.size();
    s.resize(oldsize + len + 1);
    // technically non-portable
    std::snprintf(&s[oldsize], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
}

Then the caller can reserve() plenty of space, call your function several times (presumably with other string appends in between), and write the resulting block of data to the file all at once, without any memory allocation other than the reserve. "Plenty" doesn't have to be the whole file, it could be one line or "paragraph" at a time, but anything that avoids a zillion memory allocations is a potential performance boost.

这篇关于双到字符串没有科学记数法或尾随零,有效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆