方法剥离HTML标记不是在一个安全列表 [英] Method to strip HTML tags not in a safe list
本文介绍了方法剥离HTML标记不是在一个安全列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
是否有条不属于安全标签列表中的所有HTML标签的方法?如果没有,会是什么<击>正则表达式击>实现的方法?
Is there a method that strips all HTML tags that are not on a safe tags list? If there isn't, what would be a regex the method to achieve it?
<击>我在寻找的东西,就像PHP的 用strip_tags
功能。击>
推荐答案
NullUserException答案是完美的,我做了一个小扩展方法来做到这一点,我在这里发帖,如果别人的需求。
NullUserException answer is perfect, I made a little extension method to do it and I'm posting here if anyone else needs.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.IO;
namespace Extenders
{
public static class StringExtender
{
internal static void ParseHtmlDocument(XmlDocument doc, XmlNode root, string[] allowedTags, string[] allowedAttributes, string[] allowedStyleKeys)
{
XmlNodeList nodes;
if (root == null) root = doc.ChildNodes[0];
nodes = root.ChildNodes;
foreach (XmlNode node in nodes)
{
if (!(allowedTags.Any(x => x.ToLower() == node.Name.ToLower())))
{
var safeNode = doc.CreateTextNode(node.InnerText);
root.ReplaceChild(safeNode, node);
}
else
{
if (node.Attributes != null)
{
var attrList = node.Attributes.OfType<XmlAttribute>().ToList();
foreach (XmlAttribute attr in attrList)
{
if (!(allowedAttributes.Any(x => x.ToLower() == attr.Name)))
{
node.Attributes.Remove(attr);
}
// TODO: if style is allowed, check the allowed keys: values
}
}
}
if (node.ChildNodes.Count > 0)
ParseHtmlDocument(doc, node, allowedTags, allowedAttributes, allowedStyleKeys);
}
}
public static string ParseSafeHtml(this string input, string[] allowedTags, string[] allowedAttributes, string[] allowedStyleKeys)
{
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml("<span>" + input + "</span>");
ParseHtmlDocument(xmlDoc, null, allowedTags, allowedAttributes, allowedStyleKeys);
string result;
using (var sw = new StringWriter())
{
using (var xw = new XmlTextWriter(sw))
xmlDoc.WriteTo(xw);
result = sw.ToString();
}
return result.Substring(6, result.Length - 7);
}
}
}
要使用:
var x = "<b>allowed</b><b class='text'>allowed attr</b><b id='5'>not allowed attr</b><i>not all<b>o</b>wed tag</i>".ParseSafeHtml((new string[] { "b", "#text" }), (new string[] { "class" }), (new string[] { }));
它输出:
<b>allowed</b><b class='text'>allowed attr</b><b>not allowed attr</b>not allowed tag
如果该元素是不允许的,将得到的innerText并拉出标签,清除所有内部标签。
If the element is not allowed it will get the innerText and pull out the tag, removing all inner tags.
这篇关于方法剥离HTML标记不是在一个安全列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文