普通EX pression删除XML标记及其内容 [英] Regular expression to remove XML tags and their content

查看：162 发布时间：2015/11/24 22:43:37 c# .net xml vb.net regex

本文介绍了普通EX pression删除XML标记及其内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下字符串，我想删除＆LT; BPT *＆GT; *＆LT; / BPT＆GT; 和＆LT; EPT * ＆GT; *＆LT; / EPT＆GT; （注意它们内部的附加标签内容也需要被删除）不使用XML解析器（开销太大，微小的字符串）

I have the following string and I would like to remove <bpt *>*</bpt> and <ept *>*</ept> (notice the additional tag content inside them that also needs to be removed) without using a XML parser (overhead too large for tiny strings).

The big <bpt i="1" x="1" type="bold"><b></bpt>black<ept i="1"></b></ept> <bpt i="2" x="2" type="ulined"><u></bpt>cat<ept i="2"></u></ept> sleeps.

任何正则表达式在VB.NET或C＃都行。

Any regex in VB.NET or C# will do.

推荐答案

如果你只是想删除所有的标签从字符串，用这个（C＃）：

If you just want to remove all the tags from the string, use this (C#):

try {
    yourstring = Regex.Replace(yourstring, "(<[be]pt[^>]+>.+?</[be]pt>)", "");
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

编辑：

我决定添加到我的解决方案有一个更好的选择。如果被嵌入标签的previous选项是行不通的。这一新的解决方案应该去除所有＆LT; *的 PT >标签，嵌入或不是。此外，该解决方案使用一回参照原有[是]的比赛，这样的精确匹配的结束标记被发现。该解决方案还创造了改进的性能的可重复使用Regex对象，使得每次迭代不必重新编译了Regex：

I decided to add on to my solution with a better option. The previous option would not work if there were embedded tags. This new solution should strip all <*pt> tags, embedded or not. In addition, this solution uses a back reference to the original [be] match so that the exact matching end tag is found. This solution also creates a reusable Regex object for improved performance so that each iteration does not have to recompile the Regex:

bool FoundMatch = false;

try {
    Regex regex = new Regex(@"<([be])pt[^>]+>.+?</\1pt>");
    while(regex.IsMatch(yourstring) ) {
	    yourstring = regex.Replace(yourstring, "");
    }
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

其他注意事项：

ADDITIONAL NOTES:

在注释的用户EX pressed担心'。模式匹配将CPU密集型。虽然这是事实中的一个独立的贪婪。，使用非贪婪字符的情况下，？导致正则表达式引擎只能向前看，直到它找到的模式与一个贪婪的下一个字符的第一场比赛'。这需要发动机一路向前看的字符串的末尾。我使用使用RegexBuddy 作为一个正则表达式的开发工具，它包括一个调试器，它可以让你看到不同的正则表达式的相对表现图案。它还汽车评论您的正则表达式如果需要的话，所以我决定把这些意见在这里解释一下上面使用的正则表达式：

In the comments a user expressed worry that the '.' pattern matcher would be cpu intensive. While this is true in the case of a standalone greedy '.', the use of the non-greedy character '?' causes the regex engine to only look ahead until it finds the first match of the next character in the pattern versus a greedy '.' which requires the engine to look ahead all the way to the end of the string. I use RegexBuddy as a regex development tool, and it includes a debugger which lets you see the relative performance of different regex patterns. It also auto comments your regexes if desired, so I decided to include those comments here to explain the regex used above:

    // <([be])pt[^>]+>.+?</\1pt>
// 
// Match the character "<" literally «<»
// Match the regular expression below and capture its match into backreference number 1 «([be])»
//    Match a single character present in the list "be" «[be]»
// Match the characters "pt" literally «pt»
// Match any character that is not a ">" «[^>]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the character ">" literally «>»
// Match any single character that is not a line break character «.+?»
//    Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the characters "</" literally «</»
// Match the same text as most recently matched by backreference number 1 «\1»
// Match the characters "pt>" literally «pt>»

这篇关于普通EX pression删除XML标记及其内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

普通EX pression删除XML标记及其内容 [英] Regular expression to remove XML tags and their content

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

普通EX pression删除XML标记及其内容 [英] Regular expression to remove XML tags and their content

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭