HTML消毒剂用于.NET支持的样式标签 [英] HTML Sanitizer for .NET that supports style tags

查看:285
本文介绍了HTML消毒剂用于.NET支持的样式标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个良好的HTML消毒剂在ASP.NET项目中使用。美中不足的是,消毒液必须支持样式属性,其中可能包含的CSS属性,它也必须消毒。到目前为止,我一直没能找到一个很好的产品配套使用。在我硬着头皮写我自己的清洁剂,我想我可能会尝试,看看这里的人使用的是什么第一。

I'm looking for a good HTML sanitizer to use in an ASP.NET project. The catch is that the sanitizer must support style attributes, which may contain CSS properties, which must also be sanitized. So far I haven't been able to find a good product to use. Before I bite the bullet and write my own sanitizer, I thought I might try to see what people here are using first.

库,我已经看了,并拒绝了:

Libraries that I've looked at and rejected:

  • AntiXSS库(旧版本是不安全的,新版本带风格的标记)
  • AntiSamy .NET(无人维护,缺乏的.NET版本必要的功能,有过时的依赖)
  • 的HTMLAgilityPackSanitizer AjaxControlToolkit中中(逃逸风格的标记)

理想的做法是有一个白名单为基础的清洁剂也验证对已知值或正则表达式的列表属性值。

The ideal would be to have a whitelist-based sanitizer that also validates property values against a list of known values or regexes.

任何人都可以点我在正确的方向?

Anybody able to point me in the right direction?

推荐答案

CsQuery (我我的主要作者),作为操纵H​​TML的工具。

Look at CsQuery (which I am the primary author of) as a tool for manipulating HTML.

这是一个.NET jQuery的端口,它提供给你通过你的客户端(DOM和jQuery的API)上使用同样的方法完全访问HTML。这使得它pretty的容易推出自己的消毒剂。

This is a .NET jQuery port, it provides you with complete access to HTML via the same methods you would use on the client (a DOM and jQuery's API). This makes it pretty easy to roll your own sanitizer.

里克施特拉尔过的最近关于清理HTMLnofollow的>博客文章。他展示了如何使用HTML敏捷性包他的规则去做,我发布了评论也展示了如何更容易地实现同样的事情CsQuery。基础知识都只是这个,因为标签枚举黑名单

CQ doc = CQ.Create(html);

// creates a grouped selector "iframe,form,script, ..."
string selector = String.Join(",",BlackList); 

// CsQuery uses the property indexer as a default method, it's identical 
// to the "Select" method and functions like $(...)

doc[selector].Remove();

如果您不希望实际删除一些标签内容,如:也许格式化您想要禁止的标签,你可以使用jQuery的展开代替。这将具有去除一个标记,但preserving其子的效果。

If you don't want to actually remove content in some tags, e.g. perhaps formatting tags you wish to prohibit, you can use jQuery's unwrap instead. This would have the effect of removing a tag but preserving its children.

doc[selector].UnWrap();

当你完成:

string cleanHtml = doc.Render();

还有更多的里克斯后清理JavaScript事件属性等等,但基本上CsQuery是一个熟悉又简单的方法来处理HTML的工具箱。它应该是很容易建立在你所希望的方式工作的消毒剂。

There's more at Ricks' post for cleaning up javascript event attributes and so on, but basically CsQuery is a toolbox with a familiar and simple way to manipulate HTML. It should be easy enough to create a sanitizer that works in the way you want.

CsQuery的DOM模型还包含方法来直接访问样式(例如,在一个更​​方便的方式不仅仅是操纵字符串),如果你需要做一些像删除某些指定的样式。例如,你可以从所有元素中删除字体重量的风格:

CsQuery's DOM model also contains methods to access the styles directly (e.g. in a more convenient way than just manipulating the string), if you need to do something like remove certain named styles. For example you could remove the "font-weight" style from all elements:

// use the [attribute] selector to target only elements with styles

foreach (IDomObject element in doc["[style]"]) {
    if (element.HasStyle("font-weight")) {
        element.RemoveStyle("font-weight");
    }
}

CsQuery的主要缺点,现在是文档。它的API被设计为在浏览器的DOM和jQuery匹配尽可能紧密,与公共API可能(jQuery和C#的特定语言differnces)有很好的注释,所以它应该是很容易打击,一旦你要去code。

The major shortcoming of CsQuery right now is documentation. It's API is designed to match the browser DOM and jQuery as closely as possible (given language differnces between jQuery and C#), and the public API is well commented, so it should be easy enough to code against once you get going.

但也有极少数的非标准方法(如HasStyle和RemoveStyle的)是唯一的CsQuery。基本用法覆​​盖pretty的以及在github上的自述,虽然。这也是对的NuGet为 CsQuery

But there are a handful of nonstandard methods (like "HasStyle" and "RemoveStyle" that) are unique to CsQuery. Basic usage is covered pretty well in the readme on github, though. It's also on Nuget as CsQuery.

这篇关于HTML消毒剂用于.NET支持的样式标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆