方法剥离HTML标记不是在一个安全列表 [英] Method to strip HTML tags not in a safe list

查看：84 发布时间：2016/6/5 18:44:58 c# asp.net

本文介绍了方法剥离HTML标记不是在一个安全列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有条不属于安全标签列表中的所有HTML标签的方法？如果没有，会是什么<击>正则表达式实现的方法？

Is there a method that strips all HTML tags that are not on a safe tags list? If there isn't, what would be ~~a regex~~ the method to achieve it?

<击>我在寻找的东西，就像PHP的 用strip_tags 功能。

推荐答案

NullUserException答案是完美的，我做了一个小扩展方法来做到这一点，我在这里发帖，如果别人的需求。

NullUserException answer is perfect, I made a little extension method to do it and I'm posting here if anyone else needs.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.IO;

namespace Extenders
{
    public static class StringExtender
    {
        internal static void ParseHtmlDocument(XmlDocument doc, XmlNode root, string[] allowedTags, string[] allowedAttributes, string[] allowedStyleKeys)
        {
            XmlNodeList nodes;

            if (root == null) root = doc.ChildNodes[0];
            nodes = root.ChildNodes;

            foreach (XmlNode node in nodes)
            {
                if (!(allowedTags.Any(x => x.ToLower() == node.Name.ToLower())))
                {
                    var safeNode = doc.CreateTextNode(node.InnerText);
                    root.ReplaceChild(safeNode, node);
                }
                else
                {
                    if (node.Attributes != null)
                    {
                        var attrList = node.Attributes.OfType<XmlAttribute>().ToList();
                        foreach (XmlAttribute attr in attrList)
                        {
                            if (!(allowedAttributes.Any(x => x.ToLower() == attr.Name)))
                            {
                                node.Attributes.Remove(attr);
                            }
                            // TODO: if style is allowed, check the allowed keys: values
                        }
                    }
                }

                if (node.ChildNodes.Count > 0)
                    ParseHtmlDocument(doc, node, allowedTags, allowedAttributes, allowedStyleKeys);
            }
        }

        public static string ParseSafeHtml(this string input, string[] allowedTags, string[] allowedAttributes, string[] allowedStyleKeys)
        {
            var xmlDoc = new XmlDocument();
            xmlDoc.LoadXml("<span>" + input + "</span>");

            ParseHtmlDocument(xmlDoc, null, allowedTags, allowedAttributes, allowedStyleKeys);

            string result;

            using (var sw = new StringWriter())
            {
                using (var xw = new XmlTextWriter(sw))
                    xmlDoc.WriteTo(xw);

                result = sw.ToString();
            }

            return result.Substring(6, result.Length - 7);
        }
    }
}

要使用：

var x = "<b>allowed</b><b class='text'>allowed attr</b><b id='5'>not allowed attr</b><i>not all<b>o</b>wed tag</i>".ParseSafeHtml((new string[] { "b", "#text" }), (new string[] { "class" }), (new string[] { }));

它输出：

<b>allowed</b><b class='text'>allowed attr</b><b>not allowed attr</b>not allowed tag

如果该元素是不允许的，将得到的innerText并拉出标签，清除所有内部标签。

If the element is not allowed it will get the innerText and pull out the tag, removing all inner tags.

这篇关于方法剥离HTML标记不是在一个安全列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

方法剥离HTML标记不是在一个安全列表 [英] Method to strip HTML tags not in a safe list

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

方法剥离HTML标记不是在一个安全列表 [英] Method to strip HTML tags not in a safe list

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭