使用重复的根元素解析 XML [英] Parsing XML with duplicate root elements

查看:39
本文介绍了使用重复的根元素解析 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以编程方式清除 C# .NET 4.0 中具有重复根元素的无效 XML.我想要做的是将所有内部元素合并为一个根元素并删除重复的根,以便

I am trying to programmatically clean up invalid XML with duplicate root elements in C# .NET 4.0. What I want to do is consolidate all of the inner elements into one root element and remove the duplicates roots, so that

<a>
    <b></b>
</a>
<a>
    <c></c>
</a>

变成

<a>
    <b></b>
    <c></c>
</a>

但是,重复的根元素也可能出现在内部 XML 中.在那种情况下,我们不想替换它,所以

However, the duplicated root element could also appear in the inner XML. In that case, we would not want to replace it, so that

<a>
    <a></a>
    <b></b>
</a>
<a>
    <c></c>
    <a></a>
</a>

变成

<a>
    <a></a>
    <b></b>
    <c></c>
    <a></a>
</a>

此外,不保证重复的根元素总是;它可以有任何名字.

Also, the duplicated root element isn't guaranteed to always be <a>; it could have any name.

到目前为止,我一直在想一些优雅的正则表达式来完成这个任务,比如 /<((.|\n|\r)*?)>(.|\n|\r)*<\/\1>/,但问题在于内部 XML 上的贪婪匹配匹配太多,而内部 XML 上的非贪婪匹配匹配太少.

Thus far I've been trying to think of some sort of elegant Regex to accomplish this task, such as /<((.|\n|\r)*?)>(.|\n|\r)*<\/\1>/, but the problem with this is that a greedy match on the inner XML matches too much, and non-greedy match on the inner XML matches too little.

我希望我不必求助于创建一个堆栈来计算打开和关闭标签来识别我何时回到文档的根目录.我正在寻找一种简单而优雅的方法来解决这个问题.

I was hoping I wouldn't have to resort to creating a stack to count open and close tags to identify when I was back to the root of the document. I'm looking for a simple and elegant way of solving this problem.

如果其中一个可以处理这种情况,开源第三方库可能是可接受的解决方案,但我宁愿避免使用它们.

Open source, third-party libraries are potentially acceptable solutions if one of them handles this kind of situation, but I'd rather avoid them.

有人有什么想法吗?

推荐答案

实际上将 XML 读取为 XML 可能会更好...您应该能够使用 ConformanceLevel 设置为 Fragment 并将所有片段作为普通 XML 读取.而不是使用普通的 XML 处理来选择/复制 Xml 节点.

It may be better to actually read XML as XML... You should be able to create reader with ConformanceLevel set to Fragment and read all fragments as normal XML. And than use normal XML processing to select/copy Xml nodes.

这篇关于使用重复的根元素解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆