PHP:如何使用HTML Purifier使用nl2br()保持换行符? [英] PHP: How to keep line-breaks using nl2br() with HTML Purifier?

查看:189
本文介绍了PHP:如何使用HTML Purifier使用nl2br()保持换行符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:使用 HTML净化器处理用户输入的内容时,不会被翻译成< br /> 标签。



考虑以下用户输入的内容:

  Lorem ipsum dolor sit amet。 
这是另一条线。

< pre>
.my-css-class {
color:blue;
}
< / pre>

Lorem ipsum:

< ul>
< li> Lorem< / li>
< li> Ipsum< / li>
< li> Dolor< / li>
< / ul>

Dolor sit amet,
MyName

使用HTML净化器,上面的内容正在被修改如下:


Lorem ipsum dolor坐在amet上。这是另一种方式。

  .my-css-class {
color:blue;

Lorem ipsum:

  • Lorem
  • Ipsum
  • Dolor

Dolor sit amet,MyName


正如你所看到的,打算在用户单独行上的 MyName 与上一行一起显示。



< h1>如何解决?

使用PHP nl2br() 功能当然。但是,无论我们在净化内容之前还是之后使用它,都会出现新问题。



以下是在HTML Purifier之前使用nl2br()的示例:

这是另一行。

  .my-css-class { 

颜色:蓝色;


$ / code>

Lorem ipsum:



  • Lorem
  • Ipsum
  • Dolor
  • $ b < blockquote>

    发生什么事是nl2br()为每个换行符添加< br />> ,因此即使是那些正在处理< pre> 块,以及每个< li> 标签。



    我试过了



    我试了一个 custom nl2br()函数,它用< br /> 标签替换换行符,然后删除所有<$来自< pre> 块的c $ c>< br /> 标签。它效果很好,但问题仍然存在于< li> 项目中。对< ul> 块尝试相同的方法也会删除所有< / code>除非我们使用更复杂的正则表达式来移除< / code>< / code>< ;位于< ul> 元素内但在< li> $ c>元素。但是,在< li> 项目中嵌套< ul> 要处理所有这些情况,我们必须要有一个更复杂的正则表达式!




    • 如果这是正确的方法,您能帮我与正则表达式?

    • 如果不是正确的做法,我该如何解决这个问题?我也开放给HTML Purifier的替代品。






    其他资源我已经看过:




    nl2br()函数可以部分解决此问题(如果不是完全的话):

     函数nl2br_special($ string){

    //第1步:添加< br />每个换行符的标签
    $ string = nl2br($ string);

    //第二步:删除实际的换行符
    $ string = str_replace(\\\
    ,,$ string);
    $ string = str_replace(\r,,$ string);

    //第3步:恢复内部的换行符< pre>< / pre>标签
    if(preg_match_all('/ \< pre\>(。*?)\< \ / pre\> /',$ string,$ match)){
    foreach($ match as $ a){
    foreach($ a as $ b){
    $ string = str_replace('< pre>。$ b。'< / pre>' ,< pre>。str_replace(< br />,PHP_EOL,$ b)。< / pre>,$ string);
    }
    }
    }

    //第4步:删除额外的< br />标签

    //之前< pre>标记
    $ string = str_replace(< br />< br />< br />< pre>,'< br />< br />< pre>',$ string);
    //在< / pre>之后tags
    $ string = str_replace(< / pre>< br />< br />,'< / pre>< br />',$ string);

    // Arround< ul>< / ul>标记
    $ string = str_replace(< br />< br />< ul>,'< br />< ul>',$ string);
    $ string = str_replace(< / ul>< br />< br />,'< / ul>< br />',$ string);
    //里面< ul> < / UL>标签
    $ string = str_replace(< ul>< br />,'< ul>,$ string);
    $ string = str_replace(< br />< / ul>,'< / ul>',$ string);

    // Arround< ol>< / ol>标签
    $ string = str_replace(< br />< br />< ol>,< br />< ol>',$ string);
    $ string = str_replace(< / ol>< br />< br />,'< / ol>< br />',$ string);
    // Inside< ol> < /醇>标签
    $ string = str_replace(< ol>< br />,'< ol>,$ string);
    $ string = str_replace(< br />< / ol>,'< / ol>',$ string);

    // Arround< li>< / li> tags
    $ string = str_replace(< br />< li>,'< li>',$ string);
    $ string = str_replace(< / li>< br />,'< / li>',$ string);

    返回$ string;
    }

    这必须在之前应用于内容是HTML纯化的。永远不要重新处理纯净的内容,除非你知道自己在做什么。



    请注意,因为每个换行符和双重换行符都已保留,所以您不应该使用HTML Purifier的 AutoFormat.AutoParagraph 功能:

      / /进程换行符
    $ string = nl2br_special($ string);

    //启动HTML Purifier配置
    $ purifier_config = HTMLPurifier_Config :: createDefault();
    $ purifier_config-> set('HTML.Allowed','p,ul,ol,li,strong,b,em,i,u,a [href],code,pre,blockquote,cite,img [SRC | ALT],BR,小时,H3,H4' );
    //$purifier_config-> set('AutoFormat.AutoParagraph',true); //确保不要使用这个

    //启动HTML Purifier
    $ purifier = new HTMLPurifier($ purifier_config);

    //净化内容!
    $ string = $ purifier-> purify($ string);

    就这样!




    此外,因为允许基本的HTML标记最初是为了通过不添加其他标记语法来改善用户体验,您可能希望允许用户发布代码,尤其是HTML代码,这些代码不会被HTML Purifier解释/删除。

    HTML Purifier目前允许发布代码,但需要复杂的CDATA标记:

     <![CDATA [
    在此处放置
    ]]>

    难以记忆和写作。为了尽可能简化用户体验,我相信最好允许用户通过嵌入简单的< code> (对于内联代码)和< pre> (代码块)标签。这是如何做到的:

     函数custom_code_tag_callback($ code){

    return'<代码>。修剪(用htmlspecialchars($代码[1]))。 '< /代码>';
    }
    函数custom_pre_tag_callback($ code){

    return'< pre>< code>'trim(htmlspecialchars($ code [1]))。'< ; /代码>< /预>';
    }

    //不需要HTMLPurifier的CDATA封闭,而是允许简单的< code>或< pre>标记
    $ string = preg_replace_callback(/ \< code\>(。*?)\< \ / code\> / is,'custom_code_tag_callback',$ string);
    $ string = preg_replace_callback(/ \< pre\>(。*?)\< \ / pre\> / is,'custom_pre_tag_callback',$ string);

    请注意,与nl2br处理一样,它必须在内容被纯化之前完成。另外,请记住,如果用户在他自己的帖子中放入< code> < pre> 标签代码,那么它将关闭包含他的代码的父代< code> < pre> 标记。这不能解决,也适用于原始CDATA标记或任何标记,即使是在StackOverflow上使用的标记(例如,使用代码示例中的符号将关闭代码标记)。

    最后,为了获得绝佳的用户体验,还有其他一些我们可能想要自动化的事情,比如我们希望可以点击的链接。幸运的是,这可以通过HTML Purifier AutoFormat.Linkify 功能来完成。

    以下是包含所有最终设置的最终代码:

      // ===声明函数=== 

    函数nl2br_special($ string){

    //第1步:添加< br />每个换行符的标签
    $ string = nl2br($ string);

    //第二步:删除实际的换行符
    $ string = str_replace(\\\
    ,,$ string);
    $ string = str_replace(\r,,$ string);

    //第3步:恢复内部的换行符< pre>< / pre>标签
    if(preg_match_all('/ \< pre\>(。*?)\< \ / pre\> /',$ string,$ match)){
    foreach($ match as $ a){
    foreach($ a as $ b){
    $ string = str_replace('< pre>。$ b。'< / pre>' ,< pre>。str_replace(< br />,PHP_EOL,$ b)。< / pre>,$ string);
    }
    }
    }

    //第4步:删除额外的< br />标签

    //之前< pre>标记
    $ string = str_replace(< br />< br />< br />< pre>,'< br />< br />< pre>',$ string);
    //在< / pre>之后tags
    $ string = str_replace(< / pre>< br />< br />,'< / pre>< br />',$ string);

    // Arround< ul>< / ul>标记
    $ string = str_replace(< br />< br />< ul>,'< br />< ul>',$ string);
    $ string = str_replace(< / ul>< br />< br />,'< / ul>< br />',$ string);
    //里面< ul> < / UL>标签
    $ string = str_replace(< ul>< br />,'< ul>,$ string);
    $ string = str_replace(< br />< / ul>,'< / ul>',$ string);

    // Arround< ol>< / ol>标签
    $ string = str_replace(< br />< br />< ol>,< br />< ol>',$ string);
    $ string = str_replace(< / ol>< br />< br />,'< / ol>< br />',$ string);
    // Inside< ol> < /醇>标签
    $ string = str_replace(< ol>< br />,'< ol>,$ string);
    $ string = str_replace(< br />< / ol>,'< / ol>',$ string);

    // Arround< li>< / li> tags
    $ string = str_replace(< br />< li>,'< li>',$ string);
    $ string = str_replace(< / li>< br />,'< / li>',$ string);

    返回$ string;


    $ b函数custom_code_tag_callback($ code){

    return'< code>'trim(htmlspecialchars($ code [1]) ) '< /代码>'。
    }

    函数custom_pre_tag_callback($ code){

    return'< pre>< code>'trim(htmlspecialchars($ code [1]) )。 '< /代码>< /预>';
    }



    // ===处理用户的输入===

    //处理换行符
    $ string = nl2br_special($ string);

    //允许简单的< code>或< pre>发布代码的标签
    $ string = preg_replace_callback(/ \< code\>(。*?)\< \ / code \> / is,'custom_code_tag_callback',$串);
    $ string = preg_replace_callback(/ \< pre\>(。*?)\< \ / pre\> / is,'custom_pre_tag_callback',$ string);


    //启动HTML Purifier配置
    $ purifier_config = HTMLPurifier_Config :: createDefault();
    $ purifier_config-> set('HTML.Allowed','p,ul,ol,li,strong,b,em,i,u,a [href],code,pre,blockquote,cite,img [SRC | ALT],BR,小时,H3,H4' );
    $ purifier_config-> set('AutoFormat.Linkify',true); //使链接可点击
    //$purifier_config-> set('HTML.TargetBlank',true); //取消注释,如果你想链接打开新的标签
    //$purifier_config-> set('AutoFormat.AutoParagraph',true); //将此评论与nl2br冲突

    $ b $ //启动HTML Purifier
    $ purifier = new HTMLPurifier($ purifier_config);

    //净化内容!
    $ string = $ purifier-> purify($ string);

    干杯!


    Issue: When using HTML Purifier to process user-inputted content, line-breaks are not being translated into <br /> tags.

    Consider the following user-inputted content:

    Lorem ipsum dolor sit amet.
    This is another line.
    
    <pre>
    .my-css-class {
        color: blue;
    }
    </pre>
    
    Lorem ipsum:
    
    <ul>
    <li>Lorem</li>
    <li>Ipsum</li>
    <li>Dolor</li>
    </ul>
    
    Dolor sit amet,
    MyName
    

    When processed using HTML Purifier, the above is being altered to the following:

    Lorem ipsum dolor sit amet. This is another line.

    .my-css-class {
        color: blue;  
    } 
    

    Lorem ipsum:

    • Lorem
    • Ipsum
    • Dolor
    Dolor sit amet, MyName

    As you can see, "MyName" which was intended to be on a separate line by the user, is being displayed altogether with the previous line.

    How to fix?

    Using the PHP nl2br() function, of course. However, new issues arise whether we use it before or after purifying the content.

    Here is an example when using nl2br() before HTML Purifier:

    Lorem ipsum dolor sit amet.
    This is another line.

    .my-css-class {
    
        color: blue; 
    
    } 
    

    Lorem ipsum:

    • Lorem
    • Ipsum
    • Dolor

    Dolor sit amet,
    MyName

    What happens is that nl2br() adds <br /> for each line-break, therefore even the ones in the <pre> block are being processed, as well as the line-breaks after each <li> tag.

    What I tried

    I tried a custom nl2br() function which replaces line-breaks with <br /> tags, and then removes all <br /> tags from <pre> blocks. It works great, however the issue remains for the <li> items.

    Trying the same approach for <ul> blocks would also remove all <br /> tags from the <li> children, unless we would use a more complex regex to remove <br /> tags that are inside <ul> elements but outside <li> elements. But then what about nested <ul> within a <li> item? To handle all those situations we'd have to have an even more complex regex!

    • If this is the right approach, could you help me out with the regex?
    • If it's not the right approach, how could I solve this problem? I am also open to alternatives to HTML Purifier.

    Other resources that I've already looked at:

    解决方案

    This issue can be solved partially (if not completely) with a custom nl2br() function:

    function nl2br_special($string){
    
        // Step 1: Add <br /> tags for each line-break
        $string = nl2br($string); 
    
        // Step 2: Remove the actual line-breaks
        $string = str_replace("\n", "", $string);
        $string = str_replace("\r", "", $string);
    
        // Step 3: Restore the line-breaks that are inside <pre></pre> tags
        if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){
            foreach($match as $a){
                foreach($a as $b){
                $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string);
                }
            }
        }
    
        // Step 4: Removes extra <br /> tags
    
        // Before <pre> tags
        $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string);
        // After </pre> tags
        $string = str_replace("</pre><br /><br />", '</pre><br />', $string);
    
        // Arround <ul></ul> tags
        $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string);
        $string = str_replace("</ul><br /><br />", '</ul><br />', $string);
        // Inside <ul> </ul> tags
        $string = str_replace("<ul><br />", '<ul>', $string);
        $string = str_replace("<br /></ul>", '</ul>', $string);
    
        // Arround <ol></ol> tags
        $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string);
        $string = str_replace("</ol><br /><br />", '</ol><br />', $string);
        // Inside <ol> </ol> tags
        $string = str_replace("<ol><br />", '<ol>', $string);
        $string = str_replace("<br /></ol>", '</ol>', $string);
    
        // Arround <li></li> tags
        $string = str_replace("<br /><li>", '<li>', $string);
        $string = str_replace("</li><br />", '</li>', $string);
    
        return $string;
    }
    

    This must be applied to the content before it is HTML-Purified. Never re-process a purified content, unless you know what you're doing.

    Please note that because each line-break and double line-breaks are already kept, you should not use the AutoFormat.AutoParagraph feature of HTML Purifier:

    // Process line-breaks
    $string = nl2br_special($string);
    
    // Initiate HTML Purifier config
    $purifier_config = HTMLPurifier_Config::createDefault();
    $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4');
    //$purifier_config->set('AutoFormat.AutoParagraph', true); // Make sure to NOT use this
    
    // Initiate HTML Purifier
    $purifier = new HTMLPurifier($purifier_config);
    
    // Purify the content!
    $string = $purifier->purify($string);
    

    That's it!


    Furthermore, because allowing basic HTML tags was originally intended to improve user experience by not adding another markup syntax, you might want to allow users to post code, and especially HTML code, which would not be interpreted/removed by HTML Purifier.

    HTML Purifier currently allows to post code but requires complex CDATA markers:

    <![CDATA[
    Place code here
    ]]>
    

    Hard to remember and to write. To simplify the user experience as much as possible I believe it is best to allow users to add code by embedding it with simple <code> (for inline code) and <pre> (for blocks of code) tags. Here is how to do that:

    function custom_code_tag_callback($code) {
    
        return '<code>'.trim(htmlspecialchars($code[1])).'</code>';
    }
    function custom_pre_tag_callback($code) {
    
        return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>';
    }
    
    // Don't require HTMLPurifier's CDATA enclosing, instead allow simple <code> or <pre> tags
    $string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string);
    $string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string);
    

    Note that like the nl2br processing, it must be done before the content is HTML Purified. Also, keep in mind that if the user puts <code> or <pre> tags in his own posted code, then it will close the parent <code> or <pre> tag enclosing his code. This cannot be solved, and also applies with the original CDATA markers or with any markup, even the one used on StackOverflow (for example using the ` symbol in a code sample will close the code tag).

    Finally, for a great user experience there are other things that we might want to automate like for example the links which we want to be made clickable. Luckily this can be done by HTML Purifier AutoFormat.Linkify feature.

    Here is the final code that includes everything for an ultimate setup:

    // === Declare functions ===
    
    function nl2br_special($string){
    
        // Step 1: Add <br /> tags for each line-break
        $string = nl2br($string); 
    
        // Step 2: Remove the actual line-breaks
        $string = str_replace("\n", "", $string);
        $string = str_replace("\r", "", $string);
    
        // Step 3: Restore the line-breaks that are inside <pre></pre> tags
        if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){
            foreach($match as $a){
                foreach($a as $b){
                $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string);
                }
            }
        }
    
        // Step 4: Removes extra <br /> tags
    
        // Before <pre> tags
        $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string);
        // After </pre> tags
        $string = str_replace("</pre><br /><br />", '</pre><br />', $string);
    
        // Arround <ul></ul> tags
        $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string);
        $string = str_replace("</ul><br /><br />", '</ul><br />', $string);
        // Inside <ul> </ul> tags
        $string = str_replace("<ul><br />", '<ul>', $string);
        $string = str_replace("<br /></ul>", '</ul>', $string);
    
        // Arround <ol></ol> tags
        $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string);
        $string = str_replace("</ol><br /><br />", '</ol><br />', $string);
        // Inside <ol> </ol> tags
        $string = str_replace("<ol><br />", '<ol>', $string);
        $string = str_replace("<br /></ol>", '</ol>', $string);
    
        // Arround <li></li> tags
        $string = str_replace("<br /><li>", '<li>', $string);
        $string = str_replace("</li><br />", '</li>', $string);
    
        return $string;
    }
    
    
    function custom_code_tag_callback($code) {
    
        return '<code>'.trim(htmlspecialchars($code[1])).'</code>';
    }
    
    function custom_pre_tag_callback($code) {
    
        return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>';
    }
    
    
    
    // === Process user's input ===
    
    // Process line-breaks
    $string = nl2br_special($string);
    
    // Allow simple <code> or <pre> tags for posting code
    $string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string);
    $string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string);
    
    
    // Initiate HTML Purifier config
    $purifier_config = HTMLPurifier_Config::createDefault();
    $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4');
    $purifier_config->set('AutoFormat.Linkify', true); // Make links clickable
    //$purifier_config->set('HTML.TargetBlank', true); // Uncomment if you want links to open new tabs
    //$purifier_config->set('AutoFormat.AutoParagraph', true); // Leave this commented as it conflicts with nl2br
    
    
    // Initiate HTML Purifier
    $purifier = new HTMLPurifier($purifier_config);
    
    // Purify the content!
    $string = $purifier->purify($string);
    

    Cheers!

    这篇关于PHP:如何使用HTML Purifier使用nl2br()保持换行符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆