TinyXML - 以任何方式来跳过有问题的DOCTYPE标签？ [英] TinyXML - any way to skip problematic DOCTYPE tag?

查看：304 发布时间：2016/10/30 0:51:06 c++ tinyxml

本文介绍了TinyXML - 以任何方式来跳过有问题的DOCTYPE标签？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用TinyXML2来解析一个看起来有点类似的XML：

I am using TinyXML2 to parse an XML that looks somewhat like:

<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE comp PUBLIC "-//JWS//DTD xyz//EN" "file:/documentum/xyz.dtd"
[<!ENTITY subject SYSTEM "dctm://he/abc">
]>
<comp>
...
</comp>

不幸的是，根据 http://www.grinninglizard.com/tinyxmldocs/ ，看起来TinyXML不支持解析DOCTYPE标签，如上面的示例中的标签。我对DTD不感兴趣，只想解析XML的其余部分（从< comp> 标签开始）。推荐或最好的方法是什么？我尝试检索根源于< comp> （使用 document.FirstChildElement（comp））的XML子树此方法失败，可能是因为TinyXML无法继续解析超出其似乎被认为是错误的<！ENTITY 标记。关于如何使用TinyXML本身可以实现这一点的任何想法（即，优选地不需要在调用TinyXML之前使用正则表达式匹配去除<！DOCTYPE ..> ）？

Unfortunately, as per http://www.grinninglizard.com/tinyxmldocs/, it looks like TinyXML doesn't support parsing DOCTYPE tags such as the one in the above sample. I am not interested in the DTD per se and would only like to parse the rest of the XML (starting with <comp> tag). What is the recommended or best way to achieve this? I tried retrieving the XML subtree rooted at <comp> (using document.FirstChildElement("comp")) but this approach failed, possibly because TinyXML is unable to continue parsing beyond the <!ENTITY tag which it seems to consider to be an error. Any ideas on how this can be achieved using TinyXML itself (i.e. preferably without requiring a preprocessing step that removes the <!DOCTYPE ..> using regular expression matching before invoking TinyXML)?

推荐答案

您可以先将整个文件加载到std :: string，跳过不支持的语句，然后解析结果文档，像这样：

You can first load the entire file into an std::string, skip the unsupported statements and then parse the resulting document, like this:

// Open the file and read it into a vector
std::ifstream ifs("filename.xml", std::ios::in | std::ios::binary | std::ios::ate);
std::ifstream::pos_type fsize = ifs.tellg();
ifs.seekg(0, ios::beg);
std::vector<char> bytes(fsize);
ifs.read(&bytes[0], fsize);

// Create string from vector
std::string xml_str(&bytes[0], fsize);

// Skip unsupported statements
size_t pos = 0;
while (true) {
    pos = xml_str.find_first_of("<", pos);
    if (xml_str[pos + 1] == '?' || // <?xml...
        xml_str[pos + 1] == '!') { // <!DOCTYPE... or [<!ENTITY...
        // Skip this line
        pos = xml_str.find_first_of("\n", pos);
    } else
        break;
}
xml_str = xml_str.substr(pos);

// Parse document as usual
TiXmlDocument doc;
doc.Parse(xml_str.c_str());

另外注意：如果XML文件太大，最好使用内存映射文件，将整个文件放入内存。但这完全是另一个问题。

Additional note: if the XML file is too large, it's better to use memory mapped files instead of loading the entire file into memory. But that's another question entirely.

这篇关于TinyXML - 以任何方式来跳过有问题的DOCTYPE标签？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

TinyXML - 以任何方式来跳过有问题的DOCTYPE标签？ [英] TinyXML - any way to skip problematic DOCTYPE tag?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

TinyXML - 以任何方式来跳过有问题的DOCTYPE标签？ [英] TinyXML - any way to skip problematic DOCTYPE tag?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭