PHP DOMDocument - 获取BODY的HTML源代码 [英] PHP DOMDocument - get html source of BODY

查看:294
本文介绍了PHP DOMDocument - 获取BODY的HTML源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PHP的DOMDocument来解析和标准化用户提交的HTML,使用 loadHTML 方法来解析内容,然后通过 saveHTML

I'm using PHP's DOMDocument to parse and normalize user-submitted HTML using the loadHTML method to parse the content then getting a well-formed result via saveHTML:

$dom= new DOMDocument();
$dom->loadHTML('<div><p>Hello World');
$well_formed= $dom->saveHTML(); 
echo($well_formed);

这是一个很好的解析片段并添加相应的结束标签的工作。问题是我也收到一堆不想要的标签,例如<!DOCTYPE> < html> < head> < body> 。我明白每个格式良好的HTML文档都需要这些标签,但正在归一化的HTML片段将被插入到现有的有效文档中。

This does a beautiful job of parsing the fragment and adding the appropriate closing tags. The problem is that I'm also getting a bunch of tags I don't want such as <!DOCTYPE>, <html>, <head> and <body>. I understand that every well-formed HTML document needs these tags, but the HTML fragment I'm normalizing is going to be inserted into an existing valid document.

推荐答案

在你的情况下,你不想使用HTML文档,但是使用HTML片段 - HTML代码;;这意味着DOMDocument不是你需要的。

IN your case, you do not want to work with an HTML document, but with an HTML fragment -- a portion of HTML code ;; which means DOMDocument is not quite what you need.

相反,我宁愿使用类似

Instead, I would rather use something like HTMLPurifier (quoting) :


HTML净化器是符合标准的
用PHP编写的HTML过滤器库。
HTML净化器不仅会删除所有
的恶意代码(更名为XSS)
,而且经过彻底审核,安全但
许可的白名单,它也将$ b​​ $ b 确保您的文档符合标准,只有
可以通过全面的
的W3C规范知识实现。

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.

如果您尝试部分代码:

<div><p>Hello World

使用 HTMLPurifier的演示页面,您可以将这个干净的HTML作为输出:

Using the demo page of HTMLPurifier, you get this clean HTML as an output :

<div><p>Hello World</p></div>

更好,不是吗? ; - )

Much better, isn't it ? ;-)

(请注意,HTMLPurfier支持各种选项,并且看看其文档可能不会受到伤害)

这篇关于PHP DOMDocument - 获取BODY的HTML源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆