strip_tags禁止一些标签 [英] strip_tags disallow some tags

查看:113
本文介绍了strip_tags禁止一些标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 strip_tags 文档,第二个参数需要允许的标签。但在我的情况下,我想做相反的事情。假设我会接受 script_tags 通常(默认)接受的标签,但只剥离< script> 标签。任何可能的方式吗?



我不是指有人为我编码,而是为如何实现这一目标提供可能方式的输入(if可能)非常感谢。

解决方案

编辑 b

要使用HTML Purifier HTML.ForbiddenElements config指令,您似乎可以这样做:

  require_once'/path/to/HTMLPurifier.auto.php'; 

$ config = HTMLPurifier_Config :: createDefault();
$ config-> set('HTML.ForbiddenElements',array('script','style','applet'));
$ purifier = new HTMLPurifier($ config);
$ clean_html = $ purifier-> purify($ dirty_html);

http://htmlpurifier.org/docs



HTML.ForbiddenElements 应设置为数组 。我不知道什么形式的数组成员应该采取:

  array('script','style','applet')

或者:

  array('< script>','< style>','< applet>')

或者...其他?



我想这是第一种形式,没有分隔符; HTML.AllowedElements 使用一种配置字符串,它对于 TinyMCE的有效元素
$ b

  tinyMCE.init({
... $ b $ code>语法

b valid_elements:a [href | target = _blank],strong / b,div [align],br,
...
});

所以我的猜测就是这个词,并且不应该提供任何属性(因为你禁止元素...虽然有 HTML.ForbiddenAttributes )。但是,这是一个猜测。

我会从 HTML.ForbiddenAttributes 文档中添加以下注释:


警告:此指令补充%HTML.ForbiddenElements
,请查看该指令,以讨论为什么您
应该在使用此指令前三思而后行。




<黑名单不如白名单稳健,但你可能有你的理由。只要小心,小心。



没有测试,我不知道该告诉你什么。我会继续寻找答案,但我可能会先去睡觉。这是很晚了。 :)






尽管我认为你应该使用 HTML Purifier 并利用它的 HTML.ForbiddenElements 配置指令,我认为这是一个合理的选择,如果你确实真的想用 strip_tags()是从黑名单中派生白名单。换句话说,删除你不想要的东西,然后使用剩下的东西。



例如:

<$ p $($)$> $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $错误[] ='空字符串';
return array();


$ html5 = array(
< menu>,< command>,< summary>,< details> < meter>,< progress>,
< output>,< keygen>,< textarea>,< option>,< optgroup> ,< datalist>,
< select>,< button>,< input>,< label> < fieldset>,< tfoot>,< < tbody>,< col>,< colgroup>,
< caption>,< table>,< math> ;>< area>>< map>,< canvas>,< track>,
< source>,< audio> ,< video>,< param>,< object>,< embed>,< iframe>,
< img>,德尔> 中, <插件> 中, < WBR> 中, <峰; br> 中, <跨度> 中, < BDO​​> 中, < BDI> 中, < RP> ;,< rt>,
< ruby​​>,<标记> 中, < U> 中, < B> 中, < I> 中, <坐席> 中, <副> 中, < KBD> 中, < SAMP> ,< var>,
< code>,< time>,< data>,< abbr> ;,< dfn> < q>,< cite>,< s>,< small>,
" strong>,< em>,< a> ,div,< figcaption>,< figure>,dd",< dt>,
< d1>,<< ;李> 中, < UL> 中, <醇> 中, < BLOCKQUOTE> 中, <预> 中, < HR> 中, < p> 中, <地址>,
< footer>,< header>,< hgroup>,< aside>,< article> < section>,
< body>,< noscript>,< script>,< style>,< meta>>< link> ;,< base>,
< title>,< head>,< html>
);

$ list = trim(strtolower($ blacklisted));
$ list = preg_replace('/ [^ a-z] / i','',$ list);
$ list ='<'。 str_replace('','><',$ list)。 >;
$ list = array_map('trim',explode('',$ list));

返回array_diff($ html5,$ list);
}

然后运行它:

  $ blacklisted ='< html> <假> < EM> 'em li ol'; 
$ whitelist = blacklistElements($ blacklisted);

if(count($ errors)){
echoThere errors.\\\
;
print_r($ errors);
回声\\\
;
} else {
// Do strip_tags()...
}

http://codepad.org/LV8ckRjd



因此,如果你传递了你不想允许的内容,它会以数组形式返回HTML5元素列表, code> strip_tags()加入字符串后:

  $ stripped = strip_tags ($ html,implode('',$ whitelist))); 

警告Emptor



现在,我已经把它们一起砍了,我知道还有一些我还没有想到的问题。例如,从 strip_tags() $ allowable_tags 参数的c $ c>手册页


注意:

此参数不应包含空格。 strip_tags()将标记
看作是< 和第一个空格之间不区分大小写的字符串,或者>
这意味着 strip_tags(< br />,< br>)会返回一个空字符串。 b

现在很晚,出于某种原因,我无法弄清楚这种方法的含义。所以我明天将不得不考虑这个问题。我也从这个 $ html5 元素中编译了HTML元素列表/ HTML5 / HTML5_element_listrel =nofollow> MDN文档页面。敏锐的读者可能会注意到所有的标签都是这种形式:

 < tagName> 

我不知道这将如何影响结果,无论我是否需要考虑使用标签< tagName /> 以及其中的一些变化。当然,还有更多标签



所以这可能不是生产准备。但你明白了。


Based on the strip_tags documentation, the second parameter takes the allowable tags. However in my case, I want to do the reverse. Say I'll accept the tags the script_tags normally (default) accept, but strip only the <script> tag. Any possible way for this?

I don't mean somebody to code it for me, but rather an input of possible ways on how to achieve this (if possible) is greatly appreciated.

解决方案

EDIT

To use the HTML Purifier HTML.ForbiddenElements config directive, it seems you would do something like:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.ForbiddenElements', array('script','style','applet'));
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

http://htmlpurifier.org/docs

HTML.ForbiddenElements should be set to an array. What I don't know is what form the array members should take:

array('script','style','applet')

Or:

array('<script>','<style>','<applet>')

Or... Something else?

I think it's the first form, without delimiters; HTML.AllowedElements uses a form of configuration string somewhat common to TinyMCE's valid elements syntax:

tinyMCE.init({
    ...
    valid_elements : "a[href|target=_blank],strong/b,div[align],br",
    ...
});

So my guess is it's just the term, and no attributes should be provided (since you're banning the element... although there is a HTML.ForbiddenAttributes, too). But that's a guess.

I'll add this note from the HTML.ForbiddenAttributes docs, as well:

Warning: This directive complements %HTML.ForbiddenElements, accordingly, check out that directive for a discussion of why you should think twice before using this directive.

Blacklisting is just not as "robust" as whitelisting, but you may have your reasons. Just beware and be careful.

Without testing, I'm not sure what to tell you. I'll keep looking for an answer, but I will likely go to bed first. It is very late. :)


Although I think you really should use HTML Purifier and utilize it's HTML.ForbiddenElements configuration directive, I think a reasonable alternative if you really, really want to use strip_tags() is to derive a whitelist from the blacklist. In other words, remove what you don't want and then use what's left.

For instance:

function blacklistElements($blacklisted = '', &$errors = array()) {
    if ((string)$blacklisted == '') {
        $errors[] = 'Empty string.';
        return array();
    }

    $html5 = array(
        "<menu>","<command>","<summary>","<details>","<meter>","<progress>",
        "<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>",
        "<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>",
        "<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>",
        "<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>",
        "<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>",
        "<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>",
        "<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>",
        "<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>",
        "<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>",
        "<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>",
        "<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>",
        "<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>",
        "<title>","<head>","<html>"
    );

    $list = trim(strtolower($blacklisted));
    $list = preg_replace('/[^a-z ]/i', '', $list);
    $list = '<' . str_replace(' ', '> <', $list) . '>';
    $list = array_map('trim', explode(' ', $list));

    return array_diff($html5, $list);
}

Then run it:

$blacklisted = '<html> <bogus> <EM> em li ol';
$whitelist = blacklistElements($blacklisted);

if (count($errors)) {
    echo "There were errors.\n";
    print_r($errors);
    echo "\n";
} else {
    // Do strip_tags() ...
}

http://codepad.org/LV8ckRjd

So if you pass in what you don't want to allow, it will give you back the HTML5 element list in an array form that you can then feed into strip_tags() after joining it into a string:

$stripped = strip_tags($html, implode('', $whitelist)));

Caveat Emptor

Now, I've kind've hacked this together and I know there are some issues I haven't thought out yet. For instance, from the strip_tags() man page for the $allowable_tags argument:

Note:

This parameter should not contain whitespace. strip_tags() sees a tag as a case-insensitive string between < and the first whitespace or >. It means that strip_tags("<br/>", "<br>") returns an empty string.

It's late and for some reason I can't quite figure out what this means for this approach. So I'll have to think about that tomorrow. I also compiled the HTML element list in the function's $html5 element from this MDN documentation page. Sharp-eyed reader's might notice all of the tags are in this form:

<tagName>

I'm not sure how this will effect the outcome, whether I need to take into account variations in the use of a shorttag <tagName/> and some of the, ahem, odder variations. And, of course, there are more tags out there.

So it's probably not production ready. But you get the idea.

这篇关于strip_tags禁止一些标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆