VTD-XML似乎在XML文档中破坏转义的字符串 [英] VTD-XML seems to be spoiling escaped string in XML document

查看：207 发布时间：2017/8/29 0:26:20 java xml escaping vtd-xml

本文介绍了VTD-XML似乎在XML文档中破坏转义的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理一个XML数据集（可以使用DrugBank数据库 here ）其中一些字段包含转义的XML字符，如&等。

I am working on an XML data set (the DrugBank database available here) where some fields contain escaped XML characters like "&", etc.

为了使问题更具体，以下是一个示例场景：

To make the problem more concrete, here is an example scenario:

<drugs>
    <drug>
        <drugbank-id>DB00001</drugbank-id>
        <general-references>
            # Askari AT, Lincoff AM: Antithrombotic Drug Therapy in Cardiovascular Disease. 2009 Oct; pp. 440&#x2013;. ISBN 9781603272346. "Google books":http://books.google.com/books?id=iadLoXoQkWEC&amp;pg=PA440.
        </general-references>
        .
    </drug>
    <drug>
    ...
    </drug>
    ...
</drugs>

由于整个文档很大，我正在解析如下：

Since the entire document is huge, I am parsing it as follows:

VTDGen gen = new VTDGen();
try {
    gen.setDoc(Files.readAllBytes(DRUGBANK_XML));
    gen.parse(true);
} catch (IOException | ParseException e) {
    SystemHelper.exitWithMessage(e, "Unable to process Drugbank XML data. Aborting.");
}
VTDNav nav = gen.getNav();
AutoPilot pilot = new AutoPilot(nav);
pilot.selectXPath("//drugs/drug");
while (pilot.evalXPath() != -1) {
    long fragment = nav.getContentFragment();
    String drugXML = nav.toString((int) fragment, (int) (fragment >> 32));
    System.out.println(drugXML);
    finerParse(drugXML); // another method handling a more detailed data analysis
}

当我测试 finerParse 方法与样本xml（从相同的数据复制的片段），它工作正常。但是从上面的代码调用时，它失败了错误消息 Entity中的错误：非法实体char 。输入到 finerParse （即 drugXML string）时，我注意到字符串& amp; pg = PA440 更改为& pg = PA440。

When I tested the finerParse method with sample xml (snippets copy-pasted from the same data), it worked fine. But when called from the above code, it failed with the error message Errors in Entity: Illegal entity char. Upon printing the input to finerParse (i.e., the drugXML string), I noticed that the string &pg=PA440 in the original xml was changed to "&pg=PA440".

为什么会这样？我所做的就是使用一个非常有名的解析器解析它。

Why is this happening? All I am doing is parsing it using with a very well known parser.

我已经找到了一个替代解决方案，我只是将VTDNav作为参数传递给 finerParse ，而不是首先获取内容字符串并传递该字符串。但是我仍然很好奇上述方法出了什么问题。

P.S. I have found an alternate solution where I am simply passing the VTDNav as the argument to finerParse instead of first obtaining the content string and passing that string. But I am still curious about what is going wrong with the above approach.

VTD-XML似乎在XML文档中破坏转义的字符串 [英] VTD-XML seems to be spoiling escaped string in XML document

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

VTD-XML似乎在XML文档中破坏转义的字符串 [英] VTD-XML seems to be spoiling escaped string in XML document

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭