C ++什么是将文件读入std :: string的最有效的方法? [英] C++ what's the most efficient way to read a file into a std::string?
问题描述
pre $
std :: string
file2string(std :: string filename)
{
std :: ifstream file(filename.c_str());
if(!file.is_open()){
//如果他们传递了一个错误的文件名,或者我们没有读取权限,
//我们传回一个空字符串。
return;
//找出有多少数据
file.seekg(0,std :: ios :: end);
std :: streampos length = file.tellg();
file.seekg(0,std :: ios :: beg);
//获取一个大小为
的向量std :: vector< char>的buf(长度);
//填充大小为
的缓冲区file.read(& buf [0],length);
file.close();
//将缓冲区作为字符串返回
std :: string s(buf.begin(),buf.end());
return s;
迭代器抽象和算法,我会喜欢以下方法来读取文件(或任何其他输入流)到
std :: string
(然后打印内容):#include< algorithm>
#include< fstream>
#include< iostream>
#include< iterator>
#include< string>
$ b $ int main()
{
std :: string s(std :: istreambuf_iterator< char>(std :: ifstream(file)
> > std :: skipws),
std :: istreambuf_iterator< char>());
std :: cout<< file ='<< s< \\\
;
$ b $ p
$ b这对我自己实现的IOStream来说当然是快速的,但是它需要很多欺骗实际上得到它快。首先,它需要优化算法来处理分段序列:流可以被看作是输入缓冲器的序列。我不知道任何STL实施一贯做这个优化。奇怪的使用
std :: skipws
只是引用刚创建的流:std :: istreambuf_iterator< char>
期望临时文件流不会绑定的引用。
因为这可能不是最快的方法,所以我倾向于使用
std :: getline()
与一个特定的换行符,即不在文件中:
std :: string s;
//可以选择保留空间,尽管我不会对
//重新分配太模糊,因为读取可能会占据主导地位
std :: getline(std :: ifstream(file )>> std :: skipws,s,0);
假设文件不包含空字符。任何其他角色都可以。不幸的是,
std :: getline()
需要一个char_type
作为分隔参数,而不是int_type
这就是成员std :: istream :: getline()
的分隔符:在这种情况下,你可以使用<$ c $ (char_type
,int_type
)的c> eof()eof()
引用char_traits< char>
)的相应成员。反过来,成员版本不能使用,因为您需要提前知道文件中有多少个字符。
顺便说一句,我看到了一些尝试使用查找来确定文件的大小。这是不行的。问题是,在
std :: ifstream
(实际上在std :: filebuf
)中完成的代码转换可以创建与文件中的字节数不同的字符数。不可否认,使用默认的C语言环境时情况并非如此,并且可以检测到这不会进行任何转换。否则,最好的方法是运行该文件并确定正在生成的字符数。实际上,我认为这是在代码转换可能有意义的时候需要做的事情,尽管我并不认为它实际上已经完成了。但是,没有一个例子明确地设置了C语言环境,的std ::区域设置::全局(标准::区域( C));
。即使这样,也需要在std :: ios_base :: binary
模式下打开文件,否则在读取时,行结束符可能被单个字符替换。无可否认,这只会使结果更短,不会再更长。
使用从
std :: streambuf *
(即涉及rdbuf()
)的所有内容都需要在某个点复制结果内容。鉴于该文件实际上可能非常大,这可能不是一个选项。没有这个副本,这可能是最快的方法,但是。为避免复制,可以创建一个简单的自定义流缓冲区,该缓冲区引用std :: string
作为构造函数参数,并直接附加到std :: string
:#include< fstream>
#include< iostream>
#include< string>
$ b class custombuf:
public std :: streambuf
{
public:
custombuf(std :: string& target):target_(target){
this-> setp(this-> buffer_,this-> buffer_ + bufsize - 1);
}
private:
std :: string&目标_;
enum {bufsize = 8192};
char buffer_ [bufsize];
int overflow(int c){
if(!traits_type :: eq_int_type(c,traits_type :: eof()))
{
* this-> pptr() = traits_type :: to_char_type(c);
this-> pbump(1);
}
this-> target_.append(this-> pbase(),this-> pptr() - this-> pbase());
this-> setp(this-> buffer_,this-> buffer_ + bufsize - 1);
return traits_type :: not_eof(c);
}
int sync(){this-> overflow(traits_type :: eof());返回0; }
};
int main()
{
std :: string s;
custombuf sbuf(s);
if(std :: ostream(& sbuf)
< std :: ifstream(readfile.cpp)。rdbuf()
<< std :: flush ){
std :: cout<< file ='<< s< \\\
;
}
else {
std :: cout<< 未能读取文件。
$ b $ p $至少有一个适当选择的缓冲区,版本是相当快的。哪个版本最快将取决于系统,正在使用的标准C ++库以及可能的其他一些因素,即您想要衡量的性能。I currently do this, and the conversion to std::string at the end take 98% of the execution time. There must be a better way!
std::string file2string(std::string filename) { std::ifstream file(filename.c_str()); if(!file.is_open()){ // If they passed a bad file name, or one we have no read access to, // we pass back an empty string. return ""; } // find out how much data there is file.seekg(0,std::ios::end); std::streampos length = file.tellg(); file.seekg(0,std::ios::beg); // Get a vector that size and std::vector<char> buf(length); // Fill the buffer with the size file.read(&buf[0],length); file.close(); // return buffer as string std::string s(buf.begin(),buf.end()); return s; }
解决方案Being a big fan of C++ iterator abstraction and the algorithms, I would love the following to be the fasted way to read a file (or any other input stream) into a
std::string
(and then print the content):#include <algorithm> #include <fstream> #include <iostream> #include <iterator> #include <string> int main() { std::string s(std::istreambuf_iterator<char>(std::ifstream("file") >> std::skipws), std::istreambuf_iterator<char>()); std::cout << "file='" << s << "'\n"; }
This certainly is fast for my own implementation of IOStreams but it requires a lot of trickery to actually get it fast. Primarily, it requires optimizing algorithms to cope with segmented sequences: a stream can be seen as a sequence of input buffers. I'm not aware of any STL implementation consistently doing this optimization. The odd use of
std::skipws
is just to get reference to the just created stream: thestd::istreambuf_iterator<char>
expects a reference to which the temporary file stream wouldn't bind.Since this probably isn't the fastest approach, I would be inclined to use
std::getline()
with a particular "newline" character, i.e. on which isn't in the file:std::string s; // optionally reserve space although I wouldn't be too fuzzed about the // reallocations because the reads probably dominate the performances std::getline(std::ifstream("file") >> std::skipws, s, 0);
This assumes that the file doesn't contain a null character. Any other character would do as well. Unfortunately,
std::getline()
takes achar_type
as delimiting argument, rather than anint_type
which is what the memberstd::istream::getline()
takes for the delimiter: in this case you could useeof()
for a character which never occurs (char_type
,int_type
, andeof()
refer to the respective member ofchar_traits<char>
). The member version, in turn, can't be used because you would need to know ahead of time how many characters are in the file.BTW, I saw some attempts to use seeking to determine the size of the file. This is bound not to work too well. The problem is that the code conversion done in
std::ifstream
(well, actually instd::filebuf
) can create a different number of characters than there are bytes in the file. Admittedly, this isn't the case when using the default C locale and it is possible to detect that this doesn't do any conversion. Otherwise the best bet for the stream would be to run over the file and determine the number of characters being produced. I actually think that this is what would be needed to be done when the code conversion could something interesting although I don't think it actually is done. However, none of the examples explicitly set up the C locale, using e.g.std::locale::global(std::locale("C"));
. Even with this it is also necessary to open the file instd::ios_base::binary
mode because otherwise end of line sequences may be replaced by a single character when reading. Admittedly, this would only make the result shorter, never longer.The other approaches using the extraction from
std::streambuf*
(i.e. those involvingrdbuf()
) all require that the resulting content is copied at some point. Given that the file may actually be very large this may not be an option. Without the copy this could very well be the fastest approach, however. To avoid the copy, it would be possible to create a simple custom stream buffer which takes a reference to astd::string
as constructor argument and directly appends to thisstd::string
:#include <fstream> #include <iostream> #include <string> class custombuf: public std::streambuf { public: custombuf(std::string& target): target_(target) { this->setp(this->buffer_, this->buffer_ + bufsize - 1); } private: std::string& target_; enum { bufsize = 8192 }; char buffer_[bufsize]; int overflow(int c) { if (!traits_type::eq_int_type(c, traits_type::eof())) { *this->pptr() = traits_type::to_char_type(c); this->pbump(1); } this->target_.append(this->pbase(), this->pptr() - this->pbase()); this->setp(this->buffer_, this->buffer_ + bufsize - 1); return traits_type::not_eof(c); } int sync() { this->overflow(traits_type::eof()); return 0; } }; int main() { std::string s; custombuf sbuf(s); if (std::ostream(&sbuf) << std::ifstream("readfile.cpp").rdbuf() << std::flush) { std::cout << "file='" << s << "'\n"; } else { std::cout << "failed to read file\n"; } }
At least with a suitably chosen buffer I would expect the version to be the fairly fast. Which version is the fastest will certainly depend on the system, the standard C++ library being used, and probably a number of other factors, i.e. you want to measure the performance.
这篇关于C ++什么是将文件读入std :: string的最有效的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!