去除字符串中的html标签 [英] Stripping out html tags in string
问题描述
我正在编写一个程序,应该从字符串中去除 html 标签.我一直在尝试替换所有以<"开头的字符串并以>"结尾.这(显然是因为我在这里问这个)到目前为止还没有奏效.这是我尝试过的:
I have a program I'm writing that is supposed to strip html tags out of a string. I've been trying to replace all strings that start with "<" and end with ">". This (obviously because I'm here asking this) has not worked so far. Here's what I've tried:
StrippedContent = Regex.Replace(StrippedContent, "\<.*\>", "")
这只是返回原始字符串的随机部分.我也试过
That just returns what seems like a random part of the original string. I've also tried
For Each StringMatch As Match In Regex.Matches(StrippedContent, "\<.*\>")
StrippedContent = StrippedContent.Replace(StringMatch.Value, "")
Next
做了同样的事情(返回看起来像是原始字符串的随机部分).有一个更好的方法吗?我所说的更好是指一种有效的方法.
Which did the same thing (returns what seems like a random part of the original string). Is there a better way to do this? By better I mean a way that works.
推荐答案
描述
这个表达式将:
- 查找并替换所有标签
- 避免有问题的边缘情况
正则表达式:<(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>
替换为:无
示例文本
注意鼠标悬停功能中的困难情况
Note the difficult edge case in the mouse over function
代码
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim replacementstring as String = ""
Dim matchpattern as String = "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>"
Console.Writeline(regex.Replace(sourcestring,matchpattern,replacementstring,RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline))
End Sub
End Module
替换后的字符串
these are the droids you are looking for.
这篇关于去除字符串中的html标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!