在C ++中读取大字符串 - 有没有一种安全的方法? [英] Reading large strings in C++ -- is there a safe fast way?

查看:125
本文介绍了在C ++中读取大字符串 - 有没有一种安全的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

http:// insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html 回顾了许多将整个文件读入C ++字符串的方法。最快速选项的关键代码如下所示:

  std :: string contents; 
in.seekg(0,std :: ios :: end);
contents.resize(in.tellg());
in.seekg(0,std :: ios :: beg); (& contents [0],contents.size());
in.read

不幸的是,这是不安全的,因为它依赖字符串& contents [0] 中的数据可能会影响正在读取的字符串以外的字符串。 (更普遍的说,这并不能保证不会丢弃任意内存 - 这在实践中是不太可能发生的,但依靠这种做法并不是很好的做法。)

C ++和STL被设计为提供C语言的高效特性,所以人们可能期望上面的版本同样快,但是保证是安全的。
$ b

vector< T> 的情况下,可以使用函数访问原始数据,可以用来有效地读取一个矢量:

pre $ T $ vector $ data();
const T * vector :: data()const;

第一个可以用来读取 vector< T> / code>>。不幸的是,字符串等价于 only 提供了 const 变体:

  const char * string :: data()const noexcept; 

所以这不能用来有效地读取字符串。 (可能为了支持共享字符串的实现,省略了非const 变体)。

我也检查过字符串构造函数,但接受 char * 复制数据的函数 - 无法移动它。



有没有一种安全快捷的方式将文件的全部内容读入字符串?

值得一提的是,我想读一个<$ c $为了能够使用 istringstream 来访问结果数据,我们可以使用c> string 而不是 vector< char> C $ C>。对于 vector< char>

解决方案

你真的想要避免复制,你可以将文件写入 std :: vector< char> ,然后滚动你自己的 std :: basic_stringbuf 从矢量中提取数据。 >

然后你可以声明一个 std :: istringstream 并使用 std :: basic_ios :: rdbuf 用您自己的替换输入缓冲区。



需要注意的是,如果您选择调用 std :: basic_stringbuf :: str 将需要一个副本。但是,这听起来像你不会需要这个功能,并可以实际存根。

无论你获得更好的性能这种方式将需要实际的测量。但至少避免在复制过程中必须有两个大的连续内存块。另外,如果你想要处理真正的巨大文件,而不能在连续的内存中分配,你可以使用 std :: deque 这样的基础结构。

b
$ b

另外值得一提的是,如果你真的只是流式传输数据,那么你实际上是通过先读取字符串来实现双缓冲。除非您还需要内存中的内容用于其他目的,否则 std :: ifstream 中的缓冲可能就足够了。如果你不喜欢这个文件,你可以通过关闭 来获得提升。


http://insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html reviews a number of ways of reading an entire file into a string in C++. The key code for the fastest option looks like this:

std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());

Unfortunately, this is not safe as it relies on the string being implemented in a particular way. If, for example, the implementation was sharing strings then modifying the data at &contents[0] could affect strings other than the one being read. (More generally, there's no guarantee that this won't trash arbitrary memory -- it's unlikely to happen in practice, but it's not good practice to rely on that.)

C++ and the STL are designed to provide features that are efficient as C, so one would expect there to be a version of the above that was just as fast but guaranteed to be safe.

In the case of vector<T>, there are functions which can be used to access the raw data, which can be used to read a vector efficiently:

T* vector::data();
const T* vector::data() const; 

The first of these can be used to read a vector<T> efficiently. Unfortunately, the string equivalent only provides the const variant:

const char* string::data() const noexcept;

So this cannot be used to read a string efficiently. (Presumably the non-const variant is omitted to support the shared string implementation.)

I have also checked the string constructors, but the ones that accept a char* copy the data -- there's no option to move it.

Is there a safe and fast way of reading the whole contents of a file into a string?

It may be worth noting that I want to read a string rather than a vector<char> so that I can access the resulting data using a istringstream. There's no equivalent of that for vector<char>.

解决方案

If you really want to avoid copies, you can slurp the file into a std::vector<char>, and then roll your own std::basic_stringbuf to pull data from the vector.

You can then declare a std::istringstream and use std::basic_ios::rdbuf to replace the input buffer with your own one.

The caveat is that if you choose to call istringstream::str it will invoke std::basic_stringbuf::str and will require a copy. But then, it sounds like you won't be needing that function, and can actually stub it out.

Whether you get better performance this way would require actual measurement. But at least you avoid having to have two large contiguous memory blocks during the copy. Additionally, you could use something like std::deque as your underlying structure if you want to cope with truly huge files that cannot be allocated in contiguous memory.

It's also worth mentioning that if you're really just streaming that data you are essentially double-buffering by reading it into a string first. Unless you also require the contents in memory for some other purpose, the buffering inside std::ifstream is likely to be sufficient. If you do slurp the file, you may get a boost by turning buffering off.

这篇关于在C ++中读取大字符串 - 有没有一种安全的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆