如何验证我的3,000,000行长的XML文件? [英] How can I validate my 3,000,000 line long XML file?

查看:105
本文介绍了如何验证我的3,000,000行长的XML文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件.这几乎是正确的,但事实并非如此.

I have an XML file. It is nearly correct, but it is not.

Error on line 302211.
Extra Content at the end of the document.

我实际上花了两天时间来调试它,但是文件太大了,几乎是不可能的.有什么我可以做的吗?

I've spent literally two days trying to debug this, but the file is so big it's nearly impossible. Is there anything I can do ?

这里也是相关的行(我在错误代码之前包括2行,错误从<seg>标记开始).

Here are the relevant lines also (I include 2 lines before the error code, the error begins on the <seg> tag).

 <tu>
   <tuv xml:lang="en"> 
    <prop type="feed"></prop>
    <seg>
        <bpt i="1" x="1" type="feed">
            test
        </bpt>
        To switch on computer:
        <ept i="1">
            &gt;
        </ept>
        Press device 
        <ph x="2" type="feed">
            &lt;schar _TR=&quot;123&quot; y.io.name
        </ph> or 
        <ph x="3" type="feed">
            &lt;schar _TR=&quot;274&quot; y.io.name=&quot;
        </ph> (Spain) twice. 
    </seg>
 </tuv>
</tu>

有人可以给我一些在这里找到问题的指示吗?我正在使用Notepad ++ XML插件.

Can anyone give me some pointers on finding the issue here? I am using the Notepad++ XML plugin.

推荐答案

背景说明

  • 您发布的XML片段以格式良好的XML形式独立存在 文件–问题必须在XML的其他地方.
  • 您的特定XML问题是 格式正确,而不是 有效性 .
  • Background notes

    • The XML fragment you've posted stands on its own as a well-formed XML document – the problem must be somewhere else in your XML.
    • Your particular XML problem is well-formedness, not validity.
      1. 使用具有更好诊断消息的XML解析器.基于Xerces 工具具有非常好的信息(尽管带有 一些例外情况 ).
      2. 了解导致XML文档不常见的常见问题 格式正确:
      1. Use an XML parser with better diagnostic messages. Xerces-based tools have very good messages (albeit with a few exceptions).
      2. Know the common problems that cause an XML document not to be well-formed:
        • Missing or mismatched element closing tag.
        • Missing or mismatched attribute quote delimiter.
        • < or & in content rather than &lt or &amp;.
        • Multiple root elements.
        • Incomplete markup after the root element.
        • Multiple XML declarations, or an XML declaration appears other than at the top of the document.

      分而治之.考虑一下这个巨大的XML文档的草图:

      Divide and conquer. Consider this sketch of a huge XML document:

      <root>
         <First>
             <FirstChild>
                <!-- Tons of descendent markup -->
             </FirstChild>
             <SecondChild>
                <!-- Tons of descendent markup -->
             </SecondChild>
         </First>
         <Second>
             <!-- Tons of descendent markup -->
         </Second>
      </root>
      

      淘汰过程

      1. 删除First元素.
      2. 重新验证.
      3. 如果错误消失了,请还原First元素并删除Second元素.
      4. 否则,删除FirstChild元素.
      5. 重复执行,直到可以在简化的XML文档中更轻松地发现错误为止.
      1. Delete the First element.
      2. Revalidate.
      3. If error goes away, restore First element and remove Second element.
      4. Else, remove FirstChild element.
      5. Repeat until error can be more easily spotted in the reduced XML document.

      另请参见

      • 如何解析无效(格式错误或格式错误的XML)?

        See also

        • How to parse invalid (bad / not well-formed) XML?
        • 这篇关于如何验证我的3,000,000行长的XML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆