如何使用 htmlpurifier 允许传递整个文档,包括 html、head、title、body [英] how to use htmlpurifier to allow entire document to be passed including html,head,title,body

查看:26
本文介绍了如何使用 htmlpurifier 允许传递整个文档,包括 html、head、title、body的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定下面的代码,我如何使用 htmlpurifier 让整个内容通过.我想允许整个 html 文档但 html、head、style、title、body 和 meta 被删除.

我什至尝试过 $config->set('Core.ConvertDocumentToFragment', false) 但这没有用.

任何关于从哪里开始的帮助将不胜感激.

我在这里尝试了示例 HTML Purifier - 更改默认允许的 HTML 标签配置但它不起作用.我不断收到不允许使用标签的例外情况.注意:我确实在 HTML 中添加了上述所有标签.允许但似乎没有任何效果.

<头><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1"/><title>Hello World - 电子邮件模板</title><style type="text/css">@import url(https://fonts.googleapis.com/css?family=Open+Sans:400,600);正文{-webkit-text-size-adjust:无;-ms-text-size-adjust:无;边距:0;填充:0;}</风格><身体><h1>你好</h1></html>

解决方案

HTML Purifier 默认只知道在 上下文中有效的标签,因为这是它的预期用例.基本上,它实际上并不知道什么是 <;title> 标签是 - 这很重要,因为它的大部分安全性依赖于理解 HTML 的语义基础!

关于这个主题有一些较旧的 stackoverflow 问题:

...但他们目前没有非常有用的答案,所以经过一些思考,我认为你的问题仍然有价值,我将在这里回答.

通常,这已在 HTML Purifier 论坛上讨论过几次(例如在 允许 HTML、HEAD、STYLE 和 BODY 标签) - 但简而言之,如果没有大量的工作,你就无法做到这一点,不幸的是,我目前不熟悉任何解决问题的代码片段只需简单的复制和粘贴即可解决此问题.

因此,您将不得不深入了解 HTML Purifier.

您可以使用 Customize! 文档页面.对您来说最有趣的部分是靠近底部的部分,这是一个示例,其中

被教授给 HTML Purifier.从那里引用一些后代:

$config = HTMLPurifier_Config::createDefault();$config->set('HTML.DefinitionID', 'enduser-customize.html 教程');$config->set('HTML.DefinitionRev', 1);$config->set('Cache.DefinitionImpl', null);//稍后删除它!$def = $config->getHTMLDefinition(true);$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(数组('_blank','_self','_target','_top')));$form = $def->addElement('form',//名称'Block',//内容集'Flow',//允许的孩子'Common',//属性集合数组(//属性'动作*' =>'URI','方法' =>'枚举#get|post','名称' =>'ID'));$form->excludes = array('form' => true);

<块引用>

每个参数都对应我们提出的问题之一.请注意,我们在 action 属性的末尾添加了一个星号以表明它是必需的.如果有人指定了一个没有那个的表格属性,标签将被砍掉.此外,末尾的额外行是防止表单嵌套的特殊额外声明彼此.

您必须对要支持的 <body> 标签之外的所有标签(一直到 <html>).

注意:即使您将所有这些标签添加到 HTML Purifier,您发现的设置 Core.ConvertDocumentToFragment 也需要设置为 false(正如您所做的那样).

替代方案

如果这看起来工作量太大,并且您有其他方法来清理标题部分和正文属性您还可以将文档切成小块,分别对这些小块进行消毒,然后小心地将它们重新粘在一起.

(或者,当然,只需对整个文档使用替代方法.)

Given the code below, how do I use htmlpurifier to allow the entire contents to pass through. I want to allow the entire html document but the html,head,style,title,body and meta get stripped out.

I even tried $config->set('Core.ConvertDocumentToFragment', false) but that didn't work.

Any help on where to start would be greatly appreciated.

I tried the example here HTML Purifier - Change default allowed HTML tags configuration but it doesn't work. I keep getting exceptions that the tags are not allowed. NOTE: I did add all the tags above in HTML.Allowed but nothing seems to work.

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1" />
    <title>Hello World - Email Template</title>
    <style type="text/css">
    @import url(https://fonts.googleapis.com/css?family=Open+Sans:400,600);
    body{-webkit-text-size-adjust: none;-ms-text-size-adjust: none;margin: 0;padding: 0;}
    </style>
    <body>
    <h1>Hi there</h1>
    </body>
    </html>

解决方案

HTML Purifier by default only knows tags that are valid within a <body> context, because that's its intended use-case. Basically, it doesn't actually know what a <meta>, <html>, <head> or <title> tag is - and that's a big deal, because most of its security relies on understanding the semantic underpinnings of the HTML!

There are some older stackoverflow questions on this topic:

...but they don't currently have very useful answers, so after some contemplation, I think your question still has merit and am going to answer here.

Generally, this has been discussed a few times on the HTML Purifier forums (e.g. in Allow HTML, HEAD, STYLE and BODY tags) - but the nutshell is that you can't do this without a significant amount of work, and unfortunately I'm not currently familiar with any snippet of code that solves this problem with a simple copy and paste.

So you're going to have to dig into the guts of HTML Purifier.

You can teach HTML Purifier most tags and associated behaviour using the instructions on the Customize! documentation page. The part most interesting for you would be near the bottom, an example where <form> is taught to HTML Purifier. Quoting from there for some posterity:

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
  array('_blank','_self','_target','_top')
));
$form = $def->addElement(
  'form',   // name
  'Block',  // content set
  'Flow', // allowed children
  'Common', // attribute collection
  array( // attributes
    'action*' => 'URI',
    'method' => 'Enum#get|post',
    'name' => 'ID'
  )
);
$form->excludes = array('form' => true);

Each of the parameters corresponds to one of the questions we asked. Notice that we added an asterisk to the end of the action attribute to indicate that it is required. If someone specifies a form without that attribute, the tag will be axed. Also, the extra line at the end is a special extra declaration that prevents forms from being nested within each other.

You would have to do similar things with all tags outside of the <body> tag that you want to support (all the way up to <html>).

Note: Even if you add all these tags to HTML Purifier, the setting Core.ConvertDocumentToFragment that you discovered needs to be set to false (as you have done).

Alternative

If this looks like too much work, and you have other ways to sanitise the header section and body attributes of your document, you can also cut your document into pieces, sanitise the pieces separately, then carefully stick them back together.

(Or, of course, just use the alternative for the entire document.)

这篇关于如何使用 htmlpurifier 允许传递整个文档,包括 html、head、title、body的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆