如何从 ASP.NET 中的字符串中去除 HTML 标签? [英] How can I strip HTML tags from a string in ASP.NET?
问题描述
使用 ASP.NET,如何可靠地从给定字符串中去除 HTML 标记(即不使用正则表达式)?我正在寻找类似 PHP 的 strip_tags
之类的东西.
Using ASP.NET, how can I strip the HTML tags from a given string reliably (i.e. not using regex)? I am looking for something like PHP's strip_tags
.
你好"
我尽量不重新发明轮子,但到目前为止我还没有找到满足我需求的任何东西.
I am trying not to reinvent the wheel, but I have not found anything that meets my needs so far.
推荐答案
如果它只是从字符串中去除所有 HTML 标签,这也可靠与正则表达式一起使用.替换:
If it is just stripping all HTML tags from a string, this works reliably with regex as well. Replace:
<[^>]*(>|$)
带有空字符串,全局.之后不要忘记对字符串进行规范化,替换:
with the empty string, globally. Don't forget to normalize the string afterwards, replacing:
[s
]+
一个空格,并修剪结果.可选择将任何 HTML 字符实体替换回实际字符.
with a single space, and trimming the result. Optionally replace any HTML character entities back to the actual characters.
注意:
- 有一个限制:HTML 和 XML 允许在属性值中使用
>
.当遇到此类值时,此解决方案将返回损坏的标记. - 该解决方案在技术上是安全的,例如:结果永远不会包含可用于执行跨站点脚本或破坏页面布局的任何内容.只是不是很干净.
- 与 HTML 和正则表达式一样:
如果您必须在所有情况下都正确使用适当的解析器.
- There is a limitation: HTML and XML allow
>
in attribute values. This solution will return broken markup when encountering such values. - The solution is technically safe, as in: The result will never contain anything that could be used to do cross site scripting or to break a page layout. It is just not very clean.
- As with all things HTML and regex:
Use a proper parser if you must get it right under all circumstances.
这篇关于如何从 ASP.NET 中的字符串中去除 HTML 标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!