如何使用正则表达式清理 XML 属性? [英] How to clean up XML attributes using regex?

查看:40
本文介绍了如何使用正则表达式清理 XML 属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从我的 XML 结构中删除所有属性.我的选择是正则表达式,但如果有更简单的方法,我愿意提供建议.

I'd like to remove all the attributes from my XML structure. My choice is regex but if there's a simpler way, I'm wide open for suggestions.

为了挑选一个单一的修复标签,我使用了以下内容.

To pick out a single, fix tag I used the following.

String clean = Regex.Replace(filled, ".*?<holder[^>]*?>(.*?)</holder>.*?", "$1");

这给了我标签持有人的内容.我现在想保留文本质量但省略内部标签中的所有属性.我尝试了以下方法.

That gives me the contents of the tag holder. I'd like now to keep the text mass but omit all the attributes in the inside tags. I've tried the following approach.

String plain1 = Regex.Replace(clean, "(<[^>]*?>)(.*?)(</[^>]*?>)", "$1$2$3");
String plain2 = Regex.Replace(clean, "(<[a-zA-Z]*?)([^>]*?)(>)", "$1$3");

但它给了我同样的东西(plain1)和没有原始名称的空标签(plain2).什么都没有被清理干净,或者一切都是.我做错了什么?

But it gives me just the same stuff back (plain1) and just empty tags with no original names (plain2). Nothing is getting cleaned up or everything is. What do I do wrong?

我注意到将 start 更改为 plus,给我的标签只包含名称的第一个字母,所以我很确定以下是正确的方法,只要我可以选择$1 最大的部分.我该怎么做?

I've noticed that changing start for plus, gives me tags that contain only the first letter of the names, so I'm pretty sure that the following is the right way to go as long as I can make the picked up section for $1 maximally large. How do I do that?

String plain3 = Regex.Replace(clean, "(<[a-zA-Z]+?)([^>]*?)(>)", "$1$3");

推荐答案

我的选择是正则表达式,但如果有更简单的方法,我愿意提供建议.

My choice is regex but if there's a simpler way, I'm wide open for suggestions.

我猜你已经知道这个.不要尝试用正则表达式解析xml/html,使用真正的xml解析器来处理xmls

I guess you already know this. Don't try to parse xml/html with regex, use a real xml parser to process xmls

我将使用 Linq To XML.借助递归函数可以轻松完成

I'll use Linq To XML. It can be done easily with the help of a recursive function

var xDoc = XDocument.Load(fileName1);
RemoveAttributes(xDoc.Root);
xDoc.Save(fileName2);

void RemoveAttributes(XElement xRoot)
{
    foreach (var xAttr in xRoot.Attributes().ToList())
        xAttr.Remove();

    foreach (var xElem in xRoot.Descendants())
        RemoveAttributes(xElem);
}

这篇关于如何使用正则表达式清理 XML 属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆