PHP中严格的HTML验证和过滤 [英] Strict HTML Validation and Filtering in PHP

查看:88
本文介绍了PHP中严格的HTML验证和过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



主要目的是过滤掉XSS和类似的黑客可以通过网页表单输入。次要目的是限制由非技术用户输入的HTML内容的破坏,例如通过具有HTML视图的所见即所得编辑器。



我正在考虑使用 HTML净化器,或者使用HTML DOM分析器来滚动我自己的HTML(dirty) - > DOM(dirty) - > filter-> DOM(clean) - > HTML(clean)。 >

您能用这些或更简单的策略来描述成功吗?任何需要注意的缺陷?

解决方案

我已经测试了我在HTML Purifier上知道的所有漏洞,并且做得非常好。它不仅过滤HTML,还过滤CSS和URL。



一旦将元素和属性缩小到无辜的元素和属性,缺陷就出现在属性内容中 - javascript:伪网址(IE允许协议名称中的制表符 - java	 script:仍然有效)以及触发JS的CSS属性。



解析URL可能会很棘手,例如这些是有效的: http://spoof.com:xxx@evil.com // evil.com
国际化域名(IDN)可以用两种方式编写 - Unicode和punycode。



使用HTML Purifier - 其中大部分工作都已经完成。如果您只是想修复损坏的HTML,那么请使用HTML Tidy(它可以作为PHP扩展名)。


I'm looking for best practices for performing strict (whitelist) validation/filtering of user-submitted HTML.

Main purpose is to filter out XSS and similar nasties that may be entered via web forms. Secondary purpose is to limit breakage of HTML content entered by non-technical users e.g. via WYSIWYG editor that has an HTML view.

I'm considering using HTML Purifier, or rolling my own by using an HTML DOM parser to go through a process like HTML(dirty)->DOM(dirty)->filter->DOM(clean)->HTML(clean).

Can you describe successes with these or any easier strategies that are also effective? Any pitfalls to watch out for?

解决方案

I've tested all exploits I know on HTML Purifier and it did very well. It filters not only HTML, but also CSS and URLs.

Once you narrow elements and attributes to innocent ones, the pitfalls are in attribute content – javascript: pseudo-URLs (IE allows tab characters in protocol name - java	script: still works) and CSS properties that trigger JS.

Parsing of URLs may be tricky, e.g. these are valid: http://spoof.com:xxx@evil.com or //evil.com. Internationalized domains (IDN) can be written in two ways – Unicode and punycode.

Go with HTML Purifier – it has most of these worked out. If you just want to fix broken HTML, then use HTML Tidy (it's available as PHP extension).

这篇关于PHP中严格的HTML验证和过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆