检测字符串输入是否包含HTML的正确方法是什么？ [英] What is the correct way to detect whether string inputs contain HTML or not?

查看：150 发布时间：2018/6/15 10:01:32 php html input xss sanitization

本文介绍了检测字符串输入是否包含HTML的正确方法是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在窗体上接收用户输入时，我想检测用户名或地址等字段是否不包含在XML（RSS提要）或（X）HTML（显示时）中具有特殊含义的标记。 p>

那么，哪些是检测输入的输入是否在HTML和XML上下文中不包含任何特殊字符的正确方法？

  if（mb_strpos（$ data，'<'）=== FALSE AND mb_strpos（$ data，'>'）=== FALSE）

或

  if（htmlspecialchars（$ data，ENT_NOQUOTES，'UTF-8'）=== $ data）

或

  if（preg_match（/ [^ \p {L} \  - 。'] / u，$文本））//问题：也缓存符号

我错过了其他任何东西，比如字节序列或其他棘手的方式来获取像javascript：这样的标记标签？据我所知，所有 XSS和CSFR攻击都需要 < / code>或> 来让浏览器执行代码（至少从Internet Explorer 6或更高版本开始） - 这是正确的吗？

我不想寻找某种东西来减少或过滤输入。我只是想在危险字符序列中使用XML或HTML上下文。（ strip_tags（）是非常不安全的。正如手册所说，它不检查格式不正确的HTML。）更新我想我需要澄清一下，有很多人通过转义来解决这个问题，过滤危险字符。这不是问题，大多数简单的答案都无法解决问题。更新2：示例

 
 用户提交输入
 
   if（mb_strpos（$ data，'<'）=== FALSE AND mb_strpos （$ data，'>'）=== FALSE） 
 
 我保存它
 
 
 既然数据在我的应用程序中，我会用它做两件事 -  1）以HTML格式显示 - 或者2）显示在格式元素中进行编辑。
 
 
 第一个在XML和HTML上下文中是安全的 
 
 
 < h2><？php print $ input ; ？>< / h2>' 
 < xml>< item><？php print $ input;第二种形式更危险，但它仍然应该是安全的：    
 
 
 更新3：工作代码
 
 
 您可以下载我创建的要点并将代码作为文本或HTML响应来查看我在说什么。这个简单的检查通过了 http://ha.ckers.org XSS备忘单和我找不到任何可以做到的事情。 （我忽略了Internet Explorer 6及以下版本）。
 
 
  我开始了另一项奖励，奖励某人可以用这种方法展示问题，它的实现。 
 
 
更新4：询问DOM 
 
 
 这是我们想要的DOM保护 - 那么为什么不问问它？ 帖木儿的答案引发了这种情况：
 函数not_markup（$ string）
 {
 libxml_use_internal_errors（true）; 
 if（$ xml = simplexml_load_string（< root> $ string< / root>））
 {
 return $ xml-> children（） - > count（）= == 0; 
 
 
 
 if（not_markup（$ _ POST ['title']））... 
   
 
 
 <？php 
 $ strings = array（）; 
 $ strings [] =<<< EOD 
;; alert（String.fromCharCode（88,83,83））// \'; alert（String.fromCharCode（88,83 ，83））// ;警报（使用String.fromCharCode（88,83,83））// \;警报（使用String.fromCharCode（88,83,83））//  - >< / SCRIPT> ;<>>< SCRIPT> alert（String.fromCharCode（88,83,83））< / SCRIPT> 
 EOD; 
 $ strings [] =<<< EOD 
'';！ - < XSS> =& {（）} 
 EOD; 
 $ strings [] =<<< EOD 
< SCRIPT SRC = http：//ha.ckers.org/xss.js>< / SCRIPT> 
 EOD; 
 $ strings [] =<<< EOD 
这是一个安全文本
 EOD; 
 $ strings [] =<<<< EOD 
< IMG SRC =javascript：alert（'XSS'）;> 
 EOD; 
 $ strings [] =<<< EOD 
< IMG SRC = javascript：alert（'XSS'）> 
 EOD; 
 $ strings [] =<<<<< EOD 
<<<>&＃118;&＃97;&＃115; &安培;＃99;&安培;＃114;&安培;＃105;&安培;＃112;&安培;＃116;&安培;＃58;&安培;＃97;&安培;＃108;&安培;＃101;&安培;＃114; &安培;＃116;&安培;＃40;&安培;＃39;&安培;＃88;&安培;＃83;&安培;＃83;&安培;＃39;&安培;＃41;> 
 EOD; 
 $ strings [] =<<< EOD 
 perl -e'print>< IMG SRC = java\0script：alert（\XSS \）>; '>出
 EOD; 
 $ strings [] =<<<< EOD 
< SCRIPT / XSS SRC =http://ha.ckers.org/xss.js>< / SCRIPT> 
 EOD; 
 $ strings [] =<<<< EOD 
< / TITLE>< SCRIPT> alert（XSS）;< / SCRIPT> 
 EOD; 
 
 
 
 libxml_use_internal_errors（true）; 
 $ sourceXML ='< root>< element>值< / element>< / root>'; 
 $ sourceXMLDocument = simplexml_load_string（$ sourceXML）; 
 $ sourceCount = $ sourceXMLDocument-> children（） - > count（）; 
 
 foreach（$ string为$ string）{
 $ unsafe = false; 
 $ XML ='< root>< element>'。$ string。'< / element>< / root>'; 
 $ XMLDocument = simplexml_load_string（$ XML）; 
 if（$ XMLDocument === false）{
 $ unsafe = true; 
} else {
 
 $ count = $ XMLDocument-> children（） - > count（）; 
 if（$ count！= $ sourceCount）{
 $ unsafe = true; 
 
 
 
 echo（$ unsafe？'Unsafe'：'Safe'）。'：< pre>。htmlspecialchars（$ string，ENT_QUOTES，'utf- 8'）。'< / pre>< br /> ;'.\"\\\
; 
} 
？> 
  
 
When receiving user input on forms I want to detect whether fields like "username" or "address" does not contain markup that has a special meaning in XML (RSS feeds) or (X)HTML (when displayed).

So which of these is the correct way to detect whether the input entered doesn't contain any special characters in HTML and XML context?
if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
or
if (htmlspecialchars($data, ENT_NOQUOTES, 'UTF-8') === $data)
or
if (preg_match("/[^\p{L}\-.']/u", $text)) // problem: also caches symbols
Have I missed anything else,like byte sequences or other tricky ways to get markup tags around things like "javascript:"? As far as I'm aware, all XSS and CSFR attacks require < or > around the values to get the browser to execute the code (well at least from Internet Explorer 6 or later anyway) - is this correct?

I am not looking for something to reduce or filter input. I just want to locate dangerous character sequences when used in XML or HTML context. (strip_tags() is horribly unsafe. As the manual says, it doesn't check for malformed HTML.)

Update

I think I need to clarify that there are a lot people mistaking this question for a question about basic security via "escaping" or "filtering" dangerous characters. This is not that question, and most of the simple answers given wouldn't solve that problem anyway.

Update 2: Example


User submits input
if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
I save it


Now that the data is in my application I do two things with it - 1) display in a format like HTML - or 2) display inside a format element for editing.

The first one is safe in XML and HTML context

<h2><?php print $input; ?></h2>'
<xml><item><?php print $input; ?></item></xml>

The second form is more dangerous, but it should still be safe:

<input value="<?php print htmlspecialchars($input, ENT_QUOTES, 'UTF-8');?>">

Update 3: Working Code

You can download the gist I created and run the code as a text or HTML response to see what I'm talking about. This simple check passes the http://ha.ckers.org XSS Cheat Sheet, and I can't find anything that makes it though. (I'm ignoring Internet Explorer 6 and below).

I started another bounty to award someone that can show a problem with this approach or a weakness in its implementation.

Update 4: Ask a DOM

It's the DOM that we want to protect - so why not just ask it? Timur's answer lead to this:
function not_markup($string)
{
    libxml_use_internal_errors(true);
    if ($xml = simplexml_load_string("<root>$string</root>"))
    {
        return $xml->children()->count() === 0;
    }
}

if (not_markup($_POST['title'])) ...

 解决方案 
I don't think you need to implement a huge algorithm to check if string has unsafe data - filters and regular expressions do the work. But, if you need a more complex check, maybe this will fit your needs:
<?php
$strings = array();
$strings[] = <<<EOD
    ';alert(String.fromCharCode(88,83,83))//\';alert(String.fromCharCode(88,83,83))//";alert(String.fromCharCode(88,83,83))//\";alert(String.fromCharCode(88,83,83))//--></SCRIPT>">'><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>
EOD;
$strings[] = <<<EOD
    '';!--"<XSS>=&{()}
EOD;
$strings[] = <<<EOD
    <SCRIPT SRC=http://ha.ckers.org/xss.js></SCRIPT>
EOD;
$strings[] = <<<EOD
    This is a safe text
EOD;
$strings[] = <<<EOD
    <IMG SRC="javascript:alert('XSS');">
EOD;
$strings[] = <<<EOD
    <IMG SRC=javascript:alert('XSS')>
EOD;
$strings[] = <<<EOD
    <IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>
EOD;
$strings[] = <<<EOD
    perl -e 'print "<IMG SRC=java\0script:alert(\"XSS\")>";' > out
EOD;
$strings[] = <<<EOD
    <SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>
EOD;
$strings[] = <<<EOD
    </TITLE><SCRIPT>alert("XSS");</SCRIPT>
EOD;



libxml_use_internal_errors(true);
$sourceXML = '<root><element>value</element></root>';
$sourceXMLDocument = simplexml_load_string($sourceXML);
$sourceCount = $sourceXMLDocument->children()->count();

foreach( $strings as $string ){
    $unsafe = false;
    $XML = '<root><element>'.$string.'</element></root>';
    $XMLDocument = simplexml_load_string($XML);
    if( $XMLDocument===false ){
        $unsafe = true;
    }else{

        $count = $XMLDocument->children()->count();
        if( $count!=$sourceCount ){
            $unsafe = true;
        }
    }

    echo ($unsafe?'Unsafe':'Safe').': <pre>'.htmlspecialchars($string,ENT_QUOTES,'utf-8').'</pre><br />'."\n";
}
?>


                        
这篇关于检测字符串输入是否包含HTML的正确方法是什么？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

更新3：工作代码

更新4：询问DOM

Update

Update 2: Example

Update 3: Working Code

Update 4: Ask a DOM

检测字符串输入是否包含HTML的正确方法是什么？ [英] What is the correct way to detect whether string inputs contain HTML or not?

问题描述

更新

更新2：示例

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

检测字符串输入是否包含HTML的正确方法是什么？ [英] What is the correct way to detect whether string inputs contain HTML or not?

问题描述

更新

更新2：示例

更新3：工作代码

更新4：询问DOM

Update

Update 2: Example

Update 3: Working Code

Update 4: Ask a DOM

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭