在C ++中读取大字符串 - 有没有一种安全的方法? [英] Reading large strings in C++ -- is there a safe fast way?
问题描述
http:// insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html 回顾了许多将整个文件读入C ++字符串的方法。最快速选项的关键代码如下所示:
std :: string contents;
in.seekg(0,std :: ios :: end);
contents.resize(in.tellg());
in.seekg(0,std :: ios :: beg); (& contents [0],contents.size());
in.read
不幸的是,这是不安全的,因为它依赖 C ++和STL被设计为提供C语言的高效特性,所以人们可能期望上面的版本同样快,但是保证是安全的。 在字符串$以特定的方式实施。例如,如果实现共享字符串,那么修改
& contents [0]
中的数据可能会影响正在读取的字符串以外的字符串。 (更普遍的说,这并不能保证不会丢弃任意内存 - 这在实践中是不太可能发生的,但依靠这种做法并不是很好的做法。)
$ b vector< T>
的情况下,可以使用函数访问原始数据,可以用来有效地读取一个矢量:
pre $ T $ vector $ data();
const T * vector :: data()const;
第一个可以用来读取 vector< T> / code>>。不幸的是,
字符串
等价于 only 提供了 const
变体:
const char * string :: data()const noexcept;
所以这不能用来有效地读取字符串。 (可能为了支持共享字符串的实现,省略了非const
变体)。
我也检查过字符串构造函数,但接受 char *
复制数据的函数 - 无法移动它。
有没有一种安全快捷的方式将文件的全部内容读入字符串?
值得一提的是,我想读一个<$ c $为了能够使用 istringstream $>来访问结果数据,我们可以使用c> string
而不是 vector< char>
。
你真的想要避免复制,你可以将文件写入 std :: vector< char>
,然后滚动你自己的 std :: basic_stringbuf
从矢量中提取数据。 >
然后你可以声明一个 std :: istringstream
并使用 std :: basic_ios :: rdbuf
用您自己的替换输入缓冲区。
需要注意的是,如果您选择调用 std :: basic_stringbuf :: str
将需要一个副本。但是,这听起来像你不会需要这个功能,并可以实际存根。
无论你获得更好的性能这种方式将需要实际的测量。但至少避免在复制过程中必须有两个大的连续内存块。另外,如果你想要处理真正的巨大文件,而不能在连续的内存中分配,你可以使用 std :: deque
这样的基础结构。
$ b
另外值得一提的是,如果你真的只是流式传输数据,那么你实际上是通过先读取字符串来实现双缓冲。除非您还需要内存中的内容用于其他目的,否则 std :: ifstream
中的缓冲可能就足够了。如果你不喜欢这个文件,你可以通过关闭 来获得提升。
http://insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html reviews a number of ways of reading an entire file into a string in C++. The key code for the fastest option looks like this:
std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());
Unfortunately, this is not safe as it relies on the string
being implemented in a particular way. If, for example, the implementation was sharing strings then modifying the data at &contents[0]
could affect strings other than the one being read. (More generally, there's no guarantee that this won't trash arbitrary memory -- it's unlikely to happen in practice, but it's not good practice to rely on that.)
C++ and the STL are designed to provide features that are efficient as C, so one would expect there to be a version of the above that was just as fast but guaranteed to be safe.
In the case of vector<T>
, there are functions which can be used to access the raw data, which can be used to read a vector efficiently:
T* vector::data();
const T* vector::data() const;
The first of these can be used to read a vector<T>
efficiently. Unfortunately, the string
equivalent only provides the const
variant:
const char* string::data() const noexcept;
So this cannot be used to read a string efficiently. (Presumably the non-const
variant is omitted to support the shared string implementation.)
I have also checked the string constructors, but the ones that accept a char*
copy the data -- there's no option to move it.
Is there a safe and fast way of reading the whole contents of a file into a string?
It may be worth noting that I want to read a string
rather than a vector<char>
so that I can access the resulting data using a istringstream
. There's no equivalent of that for vector<char>
.
If you really want to avoid copies, you can slurp the file into a std::vector<char>
, and then roll your own std::basic_stringbuf
to pull data from the vector.
You can then declare a std::istringstream
and use std::basic_ios::rdbuf
to replace the input buffer with your own one.
The caveat is that if you choose to call istringstream::str
it will invoke std::basic_stringbuf::str
and will require a copy. But then, it sounds like you won't be needing that function, and can actually stub it out.
Whether you get better performance this way would require actual measurement. But at least you avoid having to have two large contiguous memory blocks during the copy. Additionally, you could use something like std::deque
as your underlying structure if you want to cope with truly huge files that cannot be allocated in contiguous memory.
It's also worth mentioning that if you're really just streaming that data you are essentially double-buffering by reading it into a string first. Unless you also require the contents in memory for some other purpose, the buffering inside std::ifstream
is likely to be sufficient. If you do slurp the file, you may get a boost by turning buffering off.
这篇关于在C ++中读取大字符串 - 有没有一种安全的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!