RapidXML从文件读取 - 这里有什么错误? [英] RapidXML reading from file - what is wrong here?

查看:289
本文介绍了RapidXML从文件读取 - 这里有什么错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这两种方法读取输入文件有什么区别?

What's the difference between these two methods of reading an input file?

1)使用'ifstream.get()'

2)使用向量< char& / code>与 ifstreambuf_iterator< char> (我不太懂。)

2) Using a vector<char> with ifstreambuf_iterator<char> (less understood by me!)

有很多向量方法的明显答案)

(other than the obvious answer of having nifty vector methods to work with)

输入文件是XML,如下所示,立即解析为一个rapidxml文档。

The input file is XML, and as you see below, immediately parsed into a rapidxml document. (initialized elsewhere, see example main function.)

首先,让我向你展示两种方法来编写'load_config'函数,一个使用 ifstream .get()和一个使用向量< char>

First, let me show you two ways to write the 'load_config' function, one using ifstream.get() and one using vector<char>

方法1 ifstream.get()提供工作代码和安全的rapidXML文档对象:

Method 1 ifstream.get() provides working code, and a safe rapidXML document object:

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   //read in config file
   char ch;
   char buffer[65536];
   size_t chars_read = 0;

   while(myfile.get(ch) && (chars_read < 65535)){
      buffer[chars_read++] = ch;
   }
   buffer[chars_read++] = '\0';

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(buffer);

   //debug returns as expected here
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

方法2导致另一个库产生一个clobred rapidXML文档 - 到curl_global_init(CURL_GLOBAL_SSL)[参见下面的主要代码] - 但我不是在curl_global_init上指责它。

Method 2 results in a cloberred rapidXML document by another library - specifically, a call to curl_global_init(CURL_GLOBAL_SSL) [see main code below] - but I'm not blaming it on curl_global_init just yet.

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   vector<char> buffer((istreambuf_iterator<char>(inputfile)), 
                istreambuf_iterator<char>( ));
   buffer.push_back('\0');

   cout<<"file looks like:"<<endl;  //looks fine
   cout<<&buffer[0]<<endl;

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(&buffer[0]);

   //debug prints as expected
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

主代码:

int main(void){
   rapidxml::xml_document *doc;
   doc = new rapidxml::xml_document;

   load_config(doc);

   // this works fine:
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

   curl_global_init(CURL_GLOBAL_SSL);  //Docs say do this first.

   // debug broken object instance:
   // note a trashed 'doc' here if using vector<char> method 
   //  - seems to be because of above line... name is NULL 
   //    and other nodes are now NULL
   //    causing segfaults down stream.
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

我很确定这一切都在一个线程执行,但也许有事情发生超出我的理解。

I am pretty darn sure this is all executed in a single thread, but maybe there is something going on beyond my understanding.

我也担心,我只是固定的症状,而不是一个原因...通过简单地更改我的文件加载功能。寻找社区的帮助在这里!

I'm also worried that I only fixed a symptom, not a cause... by simply changing my file load function. Looking to the community for help here!

问题:为什么要从矢量移动到字符数组修复这个?

Question: Why would moving away from the vector to a character array fix this?

提示:我知道rapidXML使用一些聪明的内存管理,实际上直接访问输入字符串。

Hint: I'm aware that rapidXML uses some clever memory management that actually accesses the input string directly.

提示:上面的main函数创建一个动态new)xml_document。这不是原始代码,而是一个调试更改的工件。

Hint: The main function above creates a dynamic (new) xml_document. This was not in the original code, and is an artifact of debugging changes. The original (failing) code declared it and did not dynamically allocate it, but identical problems occurred.

完全公开的另一个提示(虽然我不明白为什么它是重要的) - 这个代码中有一个向量的另一个实例,它由rapidxml :: xml_document对象中的数据填充。

Another Hint for full disclosure (although I don't see why it matters) - there is another instance of a vector in this mess of code that is populated by the data in the rapidxml::xml_document object.

推荐答案

两者之间的唯一区别是向量并且 char 数组版本导致未定义的行为,当文件长度超过65535个字符(它写入 \0

The only difference between the two is that the vector version works correctly and the char array version causes undefined behavior when the file is longer than 65535 characters (it writes the \0 to the 65535th or 65536th position, which are out-of-bounds).

两个版本的另一个常见问题是,将该文件读入具有比 xml_document 更短的寿命的内存。

Another problem that is common to both versions, is that you read the file into a memory that has shorter life-time than the xml_document. Read the documentation:


字符串必须在文档的有效期内持续存在。

The string must persist for the lifetime of the document.

load_config 退出时,向量并且内存被释放。尝试访问该文档会导致读取无效内存(未定义的行为)。

When load_config exits the vector is destroyed and the memory is freed. Attempt to access the document cause reading invalid memory (undefined behavior).

char 分配在堆栈上。当 load_config 存在(访问它导致未定义的行为)时,它仍然被释放。但您没有看到崩溃,因为它尚未被覆盖。

In the char array version the memory is allocated on the stack. It is still 'freed' when load_config exists (accessing it causes undefined behavior). But you don't see the crash because it has not yet been overwritten.

这篇关于RapidXML从文件读取 - 这里有什么错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆