为什么需要这么多的HTML输入消毒? [英] Why so much HTML input sanitization necessary?

查看:97
本文介绍了为什么需要这么多的HTML输入消毒?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为我的html网站在C中实现了一个搜索引擎。我的整个网站都是用C语言编程的。

据我所知,html输入消毒是必要的,因为攻击者可以将这2个html代码片段输入到我的搜索页面来欺骗我的搜索页面下载并显示外部图片/脚本(XSS):

 < img src =路径到攻击站点/ > 
< script> ... xss-code-here ...< / script>

仅仅通过搜索'&'和'>'和剥离就可以防止这些攻击他们从搜索查询?这不会使这两个脚本无用,因为它们不会被认为是HTML吗?我见过html过滤,超越了这个地方,它们绝对过滤所有JavaScript命令和html标记! 解决方案

输入消毒并非天生必要。



最好删除一些你不想在你的输入中使用的控制字符,当然对于你想要的特定字段(例如,一个电话号码包含数字)。

但是在所有表单输入中运行转义/剥离函数是为了克服交叉现场脚本攻击绝对是错误的事情。这是可悲的常见,但它不是必要的,在许多情况下足以防止XSS。



HTML转义是输出问题,必须在输出阶段处理:也就是说,通常是在将字符串模板化到输出HTML页面的时候。将< 转换为& lt; & & amp; ,并在属性值中转义您用作属性分隔符的引号,就是这样。没有HTML注入是可能的。



如果您尝试在表单输入阶段进行HTML转义或过滤,那么只要您输出的数据来自不同的来源,并且您将要改变恰好包含< & 和字符。



还有其他形式的转义如果您尝试创建一个SQL查询用户的价值在于,你需要在这一点上做SQL字符串字面转义,这与HTML转义完全不同。如果你想把一个提交的值放到JavaScript字符串中,你将不得不做JSON风格的转义,这是又是完全不同的,如果你想在URL查询字符串参数中加一个值,你需要URL转义,而不是HTML转义,唯一明智的方法就是将字符串保存为纯文本,并且只在指出你将它们输出到不同的HTML上下文中。




仅仅通过搜索'&'和'>'并从搜索查询中剥离它们,不会阻止这些攻击吗?


$ b

好的,如果你也删除了&符号和引号。但是,用户将无法在其内容中使用这些字符。想象一下,我们试图在没有能够使用< & !如果你想在每个 上下文(HTML,JavaScript,CSS ...)中使用每个可能特殊的字符,禁止几乎所有的标点符号!

< 是一个有效的字符,用户应该被允许键入,它应该在页面上以字面小于号的形式出现。


我的整个网页都是用C语言编程的。


我很抱歉。


I have implemented a search engine in C for my html website. My entire web is programmed in C.

I understand that html input sanitization is necessary because an attacker can input these 2 html snippets into my search page to trick my search page into downloading and displaying foreign images/scripts (XSS):

<img src="path-to-attack-site"/>
<script>...xss-code-here...</script>

Wouldn't these attacks be prevented simply by searching for '<' and '>' and stripping them from the search query ? Wouldn't that render both scripts useless since they would not be considered html ? I've seen html filtering that goes way beyond this where they filter absolutely all the JavaScript commands and html markup !

解决方案

Input sanitisation is not inherently ‘necessary’.

It is a good idea to remove things like control characters that you never want in your input, and certainly for specific fields you'll want specific type-checking (so that eg. a phone number contains digits).

But running escaping/stripping functions across all form input for the purpose of defeating cross-site-scripting attacks is absolutely the wrong thing to do. It is sadly common, but it is neither necessary nor in many cases sufficient to protect against XSS.

HTML-escaping is an output issue which must be tackled at the output stage: that is, usually at the point you are templating strings into the output HTML page. Escape < to &lt;, & to &amp;, and in attribute values escape the quote you're using as an attribute delimiter, and that's it. No HTML-injection is possible.

If you try to HTML-escape or filter at the form input stage, you're going to have difficulty whenever you output data that has come from a different source, and you're going to be mangling user input that happens to include <, & and " characters.

And there are other forms of escaping. If you try to create an SQL query with the user value in, you need to do SQL string literal escaping at that point, which is completely different to HTML escaping. If you want to put a submitted value in a JavaScript string literal you would have to do JSON-style escaping, which is again completely different. If you wanted to put a value in a URL query string parameter you need URL-escaping, not HTML-escaping. The only sensible way to cope with this is to keep your strings as plain text and escape them only at the point you output them into a different context like HTML.

Wouldn't these attacks be prevented simply by searching for '<' and '>' and stripping them from the search query ?

Well yes, if you also stripped ampersands and quotes. But then users wouldn't be able to use those characters in their content. Imagine us trying to have this conversation on SO without being able to use <, & or "! And if you wanted to strip out every character that might be special when used in some context (HTML, JavaScript, CSS...) you'd have to disallow almost all punctuation!

< is a valid character, which the user should be permitted to type, and which should come out on the page as a literal less-than sign.

My entire web is programmed in C.

I'm so sorry.

这篇关于为什么需要这么多的HTML输入消毒?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆