C ++什么是将文件读入std :: string的最有效的方法? [英] C++ what's the most efficient way to read a file into a std::string?

查看:122
本文介绍了C ++什么是将文件读入std :: string的最有效的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在这样做了,最后转换成std :: string占用了执行时间的98%。

pre $ std :: string
file2string(std :: string filename)
{
std :: ifstream file(filename.c_str());
if(!file.is_open()){
//如果他们传递了一个错误的文件名,或者我们没有读取权限,
//我们传回一个空字符串。
return;

//找出有多少数据
file.seekg(0,std :: ios :: end);
std :: streampos length = file.tellg();
file.seekg(0,std :: ios :: beg);
//获取一个大小为
的向量std :: vector< char>的buf(长度);
//填充大小为
的缓冲区file.read(& buf [0],length);
file.close();
//将缓冲区作为字符串返回
std :: string s(buf.begin(),buf.end());
return s;


解决方案

迭代器抽象和算法,我会喜欢以下方法来读取文件(或任何其他输入流)到 std :: string (然后打印内容):

  #include< algorithm> 
#include< fstream>
#include< iostream>
#include< iterator>
#include< string>
$ b $ int main()
{
std :: string s(std :: istreambuf_iterator< char>(std :: ifstream(file)
> > std :: skipws),
std :: istreambuf_iterator< char>());
std :: cout<< file ='<< s< \\\
;



$ b $ p
$ b

这对我自己实现的IOStream来说当然是快速的,但是它需要很多欺骗实际上得到它快。首先,它需要优化算法来处理分段序列:流可以被看作是输入缓冲器的序列。我不知道任何STL实施一贯做这个优化。奇怪的使用 std :: skipws 只是引用刚创建的流: std :: istreambuf_iterator< char> 期望临时文件流不会绑定的引用。

因为这可能不是最快的方法,所以我倾向于使用 std :: getline()与一个特定的换行符,即不在文件中:

  std :: string s; 
//可以选择保留空间,尽管我不会对
//重新分配太模糊,因为读取可能会占据主导地位
std :: getline(std :: ifstream(file )>> std :: skipws,s,0);

假设文件不包含空字符。任何其他角色都可以。不幸的是, std :: getline()需要一个 char_type 作为分隔参数,而不是 int_type 这就是成员 std :: istream :: getline()的分隔符:在这种情况下,你可以使用<$ c $ ( char_type int_type )的c> eof() eof()引用 char_traits< char> )的相应成员。反过来,成员版本不能使用,因为您需要提前知道文件中有多少个字符。



顺便说一句,我看到了一些尝试使用查找来确定文件的大小。这是不行的。问题是,在 std :: ifstream (实际上在 std :: filebuf )中完成的代码转换可以创建与文件中的字节数不同的字符数。不可否认,使用默认的C语言环境时情况并非如此,并且可以检测到这不会进行任何转换。否则,最好的方法是运行该文件并确定正在生成的字符数。实际上,我认为这是在代码转换可能有意义的时候需要做的事情,尽管我并不认为它实际上已经完成了。但是,没有一个例子明确地设置了C语言环境, 的std ::区域设置::全局(标准::区域( C)); 。即使这样,也需要在 std :: ios_base :: binary 模式下打开文件,否则在读取时,行结束符可能被单个字符替换。无可否认,这只会使结果更短,不会再更长。



使用从 std :: streambuf * (即涉及 rdbuf())的所有内容都需要在某个点复制结果内容。鉴于该文件实际上可能非常大,这可能不是一个选项。没有这个副本,这可能是最快的方法,但是。为避免复制,可以创建一个简单的自定义流缓冲区,该缓冲区引用 std :: string 作为构造函数参数,并直接附加到 std :: string

  #include< fstream> 
#include< iostream>
#include< string>
$ b class custombuf:
public std :: streambuf
{
public:
custombuf(std :: string& target):target_(target){
this-> setp(this-> buffer_,this-> buffer_ + bufsize - 1);
}

private:
std :: string&目标_;
enum {bufsize = 8192};
char buffer_ [bufsize];
int overflow(int c){
if(!traits_type :: eq_int_type(c,traits_type :: eof()))
{
* this-> pptr() = traits_type :: to_char_type(c);
this-> pbump(1);
}
this-> target_.append(this-> pbase(),this-> pptr() - this-> pbase());
this-> setp(this-> buffer_,this-> buffer_ + bufsize - 1);
return traits_type :: not_eof(c);
}
int sync(){this-> overflow(traits_type :: eof());返回0; }
};

int main()
{
std :: string s;
custombuf sbuf(s);
if(std :: ostream(& sbuf)
< std :: ifstream(readfile.cpp)。rdbuf()
<< std :: flush ){
std :: cout<< file ='<< s< \\\
;
}
else {
std :: cout<< 未能读取文件。




$ b $ p $至少有一个适当选择的缓冲区,版本是相当快的。哪个版本最快将取决于系统,正在使用的标准C ++库以及可能的其他一些因素,即您想要衡量的性能。


I currently do this, and the conversion to std::string at the end take 98% of the execution time. There must be a better way!

std::string
file2string(std::string filename)
{
    std::ifstream file(filename.c_str());
    if(!file.is_open()){
        // If they passed a bad file name, or one we have no read access to,
        // we pass back an empty string.
        return "";
    }
    // find out how much data there is
    file.seekg(0,std::ios::end);
    std::streampos length = file.tellg();
    file.seekg(0,std::ios::beg);
    // Get a vector that size and
    std::vector<char> buf(length);
    // Fill the buffer with the size
    file.read(&buf[0],length);
    file.close();
    // return buffer as string
    std::string s(buf.begin(),buf.end());
    return s;
}

解决方案

Being a big fan of C++ iterator abstraction and the algorithms, I would love the following to be the fasted way to read a file (or any other input stream) into a std::string (and then print the content):

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>

int main()
{
    std::string s(std::istreambuf_iterator<char>(std::ifstream("file")
                                                 >> std::skipws),
                  std::istreambuf_iterator<char>());
    std::cout << "file='" << s << "'\n";
}

This certainly is fast for my own implementation of IOStreams but it requires a lot of trickery to actually get it fast. Primarily, it requires optimizing algorithms to cope with segmented sequences: a stream can be seen as a sequence of input buffers. I'm not aware of any STL implementation consistently doing this optimization. The odd use of std::skipws is just to get reference to the just created stream: the std::istreambuf_iterator<char> expects a reference to which the temporary file stream wouldn't bind.

Since this probably isn't the fastest approach, I would be inclined to use std::getline() with a particular "newline" character, i.e. on which isn't in the file:

std::string s;
// optionally reserve space although I wouldn't be too fuzzed about the
// reallocations because the reads probably dominate the performances
std::getline(std::ifstream("file") >> std::skipws, s, 0);

This assumes that the file doesn't contain a null character. Any other character would do as well. Unfortunately, std::getline() takes a char_type as delimiting argument, rather than an int_type which is what the member std::istream::getline() takes for the delimiter: in this case you could use eof() for a character which never occurs (char_type, int_type, and eof() refer to the respective member of char_traits<char>). The member version, in turn, can't be used because you would need to know ahead of time how many characters are in the file.

BTW, I saw some attempts to use seeking to determine the size of the file. This is bound not to work too well. The problem is that the code conversion done in std::ifstream (well, actually in std::filebuf) can create a different number of characters than there are bytes in the file. Admittedly, this isn't the case when using the default C locale and it is possible to detect that this doesn't do any conversion. Otherwise the best bet for the stream would be to run over the file and determine the number of characters being produced. I actually think that this is what would be needed to be done when the code conversion could something interesting although I don't think it actually is done. However, none of the examples explicitly set up the C locale, using e.g. std::locale::global(std::locale("C"));. Even with this it is also necessary to open the file in std::ios_base::binary mode because otherwise end of line sequences may be replaced by a single character when reading. Admittedly, this would only make the result shorter, never longer.

The other approaches using the extraction from std::streambuf* (i.e. those involving rdbuf()) all require that the resulting content is copied at some point. Given that the file may actually be very large this may not be an option. Without the copy this could very well be the fastest approach, however. To avoid the copy, it would be possible to create a simple custom stream buffer which takes a reference to a std::string as constructor argument and directly appends to this std::string:

#include <fstream>
#include <iostream>
#include <string>

class custombuf:
    public std::streambuf
{
public:
    custombuf(std::string& target): target_(target) {
        this->setp(this->buffer_, this->buffer_ + bufsize - 1);
    }

private:
    std::string& target_;
    enum { bufsize = 8192 };
    char buffer_[bufsize];
    int overflow(int c) {
        if (!traits_type::eq_int_type(c, traits_type::eof()))
        {
            *this->pptr() = traits_type::to_char_type(c);
            this->pbump(1);
        }
        this->target_.append(this->pbase(), this->pptr() - this->pbase());
        this->setp(this->buffer_, this->buffer_ + bufsize - 1);
        return traits_type::not_eof(c);
    }
    int sync() { this->overflow(traits_type::eof()); return 0; }
};

int main()
{
    std::string s;
    custombuf   sbuf(s);
    if (std::ostream(&sbuf)
        << std::ifstream("readfile.cpp").rdbuf()
        << std::flush) {
        std::cout << "file='" << s << "'\n";
    }
    else {
        std::cout << "failed to read file\n";
    }
}

At least with a suitably chosen buffer I would expect the version to be the fairly fast. Which version is the fastest will certainly depend on the system, the standard C++ library being used, and probably a number of other factors, i.e. you want to measure the performance.

这篇关于C ++什么是将文件读入std :: string的最有效的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆