处理格式错误的XML [英] handling malformed XML

查看:94
本文介绍了处理格式错误的XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以读取格式错误的XML文件并删除格式错误的节点?我真的希望拒绝格式错误的xml是一种选择,但事实并非如此。值得庆幸的是,xml非常简单,唯一的大问题是非转义节点。基本上我想要处理的XML场景如下:



Is it possible to read through a malformed "XML" file and remove the malformed nodes? I realy wish that rejecting malformed xml was an option, however that is not the case. Thankfully the xml is fairly simple, and the only big problem is unescaped nodes. Essentially the XML scenario that i''m trying to handle is the below:

<NODE1>
     <NODE2>
         <NODE3>
            <NODE4>Meaningful text goes here</NODE4>
         </NODE3>
     </NODE2>
     <NODE2>
         <NODE3>
            <NODE4>Meaningful text goes here</NODE4>
         </NODE3>
     </NODE2>
     <NODE2>
         <NODE3>
            <NODE4>TEXT IS CUT OFF
</NODE1>





我需要做的是找到一种方法''修复''xml删除未转义的项目,将其转换为以下内容:





What I need to do is find a way to ''fix'' the xml by removing the unescaped items which would turn it into the following:

<NODE1>
     <NODE2>
         <NODE3>
            <NODE4>Meaningful text goes here</NODE4>
         </NODE3>
     </NODE2>
     <NODE2>
         <NODE3>
            <NODE4>Meaningful text goes here</NODE4>
         </NODE3>
     </NODE2>
</NODE1>





我正在使用c#3.5,但任何音乐会也会有所帮助。



提前谢谢!



I''m working with c# 3.5 but any conceps would help as well.

Thanks in advance!

推荐答案

根据定义,正确的XML解析器不能做这样的事情,因为它与正确的XML解析器的标准相矛盾。所以,你需要别的东西,除了XML解析器之外什么都不是。例如,它可能是一个用于将垃圾转换为XML的预处理器。



我这里有两个注释:
By definition, a correct XML parser cannot ever do such things, as it contradicts to the criteria of correct XML parser. So, you need something else, which could be anything but an XML parser. For example, it could be a pre-processor used to "convert" trash into XML.

I have two notes here:
  1. 你没有指定此类处理代码的确切行为。你不应该把你的问题视为任何定义的问题。
  2. 在实践中,如果你需要这样的东西,我很确定你必须自己设计和实现你认为自己需要的行为。从技术上讲,它很可能是一种或另一种方式,但你不应该期待别人的热情,因此,任何帮助。





我试着在对这个问题的评论中简要解释这种方法的无用性。另请参阅: http://en.wikipedia.org/wiki/Garbage_in,_garbage_out [ ^ ]。



你明白了吗?



-SA


这不是一个完整的解决方案,但这里是我的想法,如何解决你的问题:滚动你自己的解析器(不应该太困难,XML有简单的语法),跟踪堆栈上当前的开放节点,关闭时弹出它标签被击中。当您点击与当前打开的标记不匹配的关闭标记时,请不断弹出堆栈,直到找到匹配的打开标记。然后你应该能够保留有效的部分并忽略其余的部分。



编辑:想想看,你可以使用XmlReader [ ^ ],只要您在格式错误的XML之后不需要任何内容​​。
This isn''t a complete solution, but here''s my idea for how to solve your problem: roll your own parser (shouldn''t be too difficult, XML has simple syntax), keeping track of the current open node on a stack, popping it when the close tag is hit. When you hit a close tag that doesn''t match the currently open tag, keep popping the stack until you reach the matching open tag. Then you should be able to keep the valid parts and ignore the rest.

Come to think of it, you could probably use XmlReader[^] to do it, as long as you don''t need anything after the malformed XML.


这篇关于处理格式错误的XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆