在只有纯文本输入的站点上使用 HTML Purifier [英] Using HTML Purifier on a site with only plain text input
问题描述
如果我能解决我和一些同事之间的分歧,我将不胜感激.
我们有一个典型的 PHP/LAMP 网络应用程序.
我们希望用户提供的唯一输入是纯文本.我们在任何时候都不会邀请或希望用户输入 HTML.表单元素大多是基本的输入文本标签.可能会有一些文本区域、复选框等.
目前没有清理输出到页面.所有动态内容,其中一些来自用户输入,只是简单地回显到页面上.我们显然需要确保安全.
我的解决方案是在页面上回显时在所有输出上使用 htmlspecialchars.
我同事的解决方案是将 HTML Purifier 添加到数据库层.他们希望在保存到数据库之前通过 HTML Purifier 传递所有用户输入的输入.显然他们已经在其他项目中像这样使用了它,但我认为这是对 HTML Purifier 用途的误解.
我的理解是,只有在允许用户输入 HTML 的站点上使用 HTML Purifier 才有意义.它采用 HTML,并根据白名单和其他规则使其更安全、更干净.
谁对谁错?
还有整个输入或输出转义"问题,但我想这是另一个时间和地点的辩论.
谢谢
作为一般规则,应该针对上下文和用例进行转义.
如果您想做的是在 HTML 上下文中输出纯文本(并且您确实这样做了),那么您需要使用转义功能来确保始终在 HTML 上下文中输出纯文本.鉴于基本的 PHP,那确实是 htmlspecialchars($yourString, ENT_QUOTES, 'yourEncoding');
.
如果您想要做的是在 HTML 上下文中输出 HTML(您不需要),那么您会希望在输出 HTML 时净化 HTML 以防止它造成损坏 - 在这里你会 $purifier->purify($yourString);
输出.
如果您想通过执行 SQL 语句将纯文本用户输入存储在数据库中(再次,您这样做),那么您应该使用准备好的语句来防止 SQL 注入,或特定于您的数据库的转义函数,例如 <代码>mysql_real_escape_string($yourString).
你应该不:
- 将数据放入数据库时转义为 HTML
- 将数据放入数据库时,将数据清理为 HTML
- 当您将数据作为纯文本输出时,将其清理为 HTML
其中,所有这些都是完全有害的,尽管程度不同.请注意,以下假设数据库是您唯一或规范的数据存储介质(它还假设您以其他方式处理了 SQL 注入 - 如果您不这样做,那将是您的主要问题):>
- 如果在将数据放入数据库时转义为 HTML,则依赖于始终将数据输出到 HTML 上下文中的保证;突然间,如果你想直接把它放到一个纯文本文件中按原样打印,你需要在输出之前对数据进行解码.
- 如果您在将数据放入数据库时将数据清理为 HTML,那么您将破坏用户放置在那里的信息.它是一个消息传递系统并且您的用户想要告诉其他人
标签?您的用户不能这样做 - 您会破坏他的消息的那部分!
当您将数据作为纯文本输出(而不对其进行转义)时,将数据清理为 HTML 可能会产生令人困惑的分页结果,如果您未将清理模块设置为删除 所有 HTML(您不应该这样做,因为您显然不想输出 HTML).
您是否对 <div>
上下文进行了清理,但是否将数据放入内联元素中?您的用户可能会将 <div>
放入您的内联元素中,从而强制将布局中断到您的页面布局中(这有多烦人取决于您的布局),或者影响用户对元数据的感知(例如使网络钓鱼更容易),例如像这样:
- 姓名:John Doe
(网站管理员)
您是否对 上下文进行了消毒?用户可以使用其他标签来影响用户对元数据的感知,例如像这样:
- 姓名:John Doe (此用户是管理员)
最坏的情况:您是否使用某个版本的 HTML Purifier 清理了您的 HTML,该版本后来证明存在允许某种恶意 HTML 存活的错误?现在,您正在输出不受信任的数据,并使在您的网页上查看此数据的用户面临风险.
净化为 HTML 和转义为 HTML(按这个顺序!)没有这个问题,但这意味着净化步骤是不必要的,这意味着这个星座只会降低你的性能.(大概这就是为什么您的同事想要在保存数据时进行清理,而不是在显示数据时进行清理-大概您的用例(与大多数情况一样)会比提交数据更频繁地显示数据,这意味着您将避免不得不经常处理性能下降.)
tl;博士
当您以纯文本形式输出时,将其清理为 HTML 并不是一个好主意.
针对用例和上下文进行转义/清理.
在您的情况下,您想为 HTML 上下文转义纯文本(= 使用 htmlspecialchars()
).
I would appreciate an answer to settle a disagreement between me and some co-workers.
We have a typical PHP / LAMP web application.
The only input we want from users is plain text. We do not invite or want users to enter HTML at any point. Form elements are mostly basic input text tags. There might be a few textareas, checkboxes etc.
There is currently no sanitizing of output to pages. All dynamic content, some of which came from user input, is simply echoed to the page. We obviously need to make it safe.
My solution is to use htmlspecialchars on all output at the time it is echoed on the page.
My co-workers' solution is to add HTML Purifier to the database layer. They want to pass all user entered input through HTML Purifier before it is saved to the database. Apparently they've used it like this on other projects but I think that is a misunderstanding of what HTML Purifier is for.
My understanding is that it only makes sense to use HTML Purifier on a site which allows the user to enter HTML. It takes HTML and makes it safer and cleaner based on a whitelist and other rules.
Who's right and who's wrong?
There's also the whole "escape on input or output" issue but I guess that's a debate for another time and place.
Thanks
As a general rule, escaping should be done for context and for use-case.
If what you want to do is output plain text in an HTML context (and you do), then you need to use escaping functionality that will ensure that you will always output plain text in an HTML context. Given basic PHP, that would indeed be htmlspecialchars($yourString, ENT_QUOTES, 'yourEncoding');
.
If what you want to do is output HTML in an HTML context (you don't), then you would want to santitise the HTML when you output it to prevent it from doing damage - here you would $purifier->purify($yourString);
on output.
If you want to store plain text user input in a database (again, you do) by executing SQL statements, then you should either use prepared statements to prevent SQL injection, or an escaping function specific to your DB, such as mysql_real_escape_string($yourString)
.
You should not:
- escape for HTML when you are putting data into the database
- sanitise as HTML when you are putting data into the database
- sanitise as HTML when you are outputting data as plain text
Of those, all are outright harmful, albeit to different degrees. Note that the following assumes the database is your only or canonical storage medium for the data (it also assumes you have SQL injection taken care of in some other way - if you don't, that'll be your primary issue):
- if you escape for HTML when you put the data into the database, you rely on the guarantee that you will always be outputting the data into an HTML context; suddenly if you want to just put it into a plaintext file for printing as-is, you need to decode the data before you output it.
- if you sanitise as HTML when you put the data into the database, you are destroying information that your user put there. Is it a messaging system and your user wanted to tell someone else about
<script>
tags? Your user can't do that - you'll destroy that part of his message!
Sanitising as HTML when you're outputting data as plain text (without also escaping it) may have confusing, page-breaking results if you don't set your sanitising module to strip all HTML (which you shouldn't, since then you clearly don't want to be outputting HTML).
Did you sanitise for a <div>
context, but are putting your data into an inline element? Your user might put a <div>
into your inline element, forcing a layout break into your page layout (how annoying this is depends on your layout), or to influence user perception of metadata (for example to make phishing easier), e.g. like this:
- Name: John Doe
(Site admin)
Did you sanitise for a <span>
context? The user could use other tags to influence user perception of metadata, e.g. like this:
- Name: John Doe (this user is an administrator)
Worst-case scenario: Did you sanitise your HTML with a version of HTML Purifier that later turns out to have a bug that does allow a certain kind of malicious HTML to survive? Now you're outputting untrusted data and putting users that view this data on your web page at risk.
Sanitising as HTML and escaping for HTML (in that order!) does not have this problem, but it means the sanitising step is unnecessary, meaning this constellation will just cost you performance. (Presumably that's why your colleague wanted to do the sanitising when saving the data, not when displaying it - presumably your use-case (like most) will display the data more often than the data will be submitted, meaning you would avoid having to deal with the performance hit frequently.)
tl;dr
Sanitising as HTML when you're outputting as plain text is not a good idea.
Escape / sanitise for use-case and context.
In your situation, you want to escape plain text for an HTML context (= use htmlspecialchars()
).
这篇关于在只有纯文本输入的站点上使用 HTML Purifier的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!