去除字符串中的html标签 [英] Stripping out html tags in string

查看:41
本文介绍了去除字符串中的html标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,应该从字符串中去除 html 标签.我一直在尝试替换所有以<"开头的字符串并以>"结尾.这(显然是因为我在这里问这个)到目前为止还没有奏效.这是我尝试过的:

I have a program I'm writing that is supposed to strip html tags out of a string. I've been trying to replace all strings that start with "<" and end with ">". This (obviously because I'm here asking this) has not worked so far. Here's what I've tried:

StrippedContent = Regex.Replace(StrippedContent, "\<.*\>", "")

这只是返回原始字符串的随机部分.我也试过

That just returns what seems like a random part of the original string. I've also tried

For Each StringMatch As Match In Regex.Matches(StrippedContent, "\<.*\>")
    StrippedContent = StrippedContent.Replace(StringMatch.Value, "")
Next

做了同样的事情(返回看起来像是原始字符串的随机部分).有一个更好的方法吗?我所说的更好是指一种有效的方法.

Which did the same thing (returns what seems like a random part of the original string). Is there a better way to do this? By better I mean a way that works.

推荐答案

描述

这个表达式将:

  • 查找并替换所有标签
  • 避免有问题的边缘情况

正则表达式:<(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>

替换为:无

示例文本

注意鼠标悬停功能中的困难情况

Note the difficult edge case in the mouse over function

这些是

代码

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim replacementstring as String = ""
    Dim matchpattern as String = "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>"
    Console.Writeline(regex.Replace(sourcestring,matchpattern,replacementstring,RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline))
  End Sub
End Module

替换后的字符串

these are the droids you are looking for.

这篇关于去除字符串中的html标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆