DOM 解析器,允许在 <script> 中使用 HTML5 样式的 </标签 [英] DOM parser that allows HTML5-style </ in <script> tag
问题描述
更新:html5lib
(问题底部)似乎很接近了,我只需要提高我对它的使用方式的理解.
Update: html5lib
(bottom of question) seems to get close, I just need to improve my understanding of how it's used.
我正在尝试为 PHP 5.3 寻找与 HTML5 兼容的 DOM 解析器.特别是,我需要在脚本标记中访问以下类似 HTML 的 CDATA:
I am attempting to find an HTML5-compatible DOM parser for PHP 5.3. In particular, I need to access the following HTML-like CDATA within a script tag:
<script type="text/x-jquery-tmpl" id="foo">
<table><tr><td>${name}</td></tr></table>
</script>
大多数解析器将过早结束解析,因为 HTML 4.01 当它在 标签内找到 ETAGO (
</
) 时,结束脚本标签解析.但是,HTML5 允许<在 之前的 code>.到目前为止,我尝试过的所有解析器要么都失败了,要么它们的文档很差,以至于我不知道它们是否有效.
Most parsers will end parsing prematurely because HTML 4.01 ends script tag parsing when it finds ETAGO (</
) inside a <script>
tag. However, HTML5 allows for </
before </script>
. All of the parsers I have tried so far have either failed, or they are so poorly documented that I haven't figured out if they work or not.
我的要求:
- 真正的解析器,而不是正则表达式.
- 能够加载完整页面或 HTML 片段.
- 能够拉回脚本内容,通过标签的 id 属性选择.
- Real parser, not regex hacks.
- Ability to load full pages or HTML fragments.
- Ability to pull script contents back out, selecting by the tag's id attribute.
输入:
<script id="foo"><td>bar</td></script>
失败输出示例(没有关闭</td>
):
Example of failing output (no closing </td>
):
<script id="foo"><td>bar</script>
一些解析器及其结果:
来源:
<?php
header('Content-type: text/plain');
$d = new DOMDocument;
$d->loadHTML('<script id="foo"><td>bar</td></script>');
echo $d->saveHTML();
输出:
Warning: DOMDocument::loadHTML(): Unexpected end tag : td in Entity, line: 1 in /home/adam/public_html/2010/10/26/dom.php on line 5
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script id="foo"><td>bar</script></head></html>
来源:
<?php
header('Content-type: text/plain');
require_once 'FluentDOM/src/FluentDOM.php';
$html = "<html><head></head><body><script id='foo'><td></td></script></body></html>";
echo FluentDOM($html, 'text/html');
输出:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head></head><body><script id="foo"><td></script></body></html>
来源:
<?php
header('Content-type: text/plain');
require_once 'phpQuery.php';
phpQuery::newDocumentHTML(<<<EOF
<script type="text/x-jquery-tmpl" id="foo">
<td>test</td>
</script>
EOF
);
echo (string)pq('#foo');
echo (string)pq('#foo');
输出:
<script type="text/x-jquery-tmpl" id="foo">
<td>test
</script>
可能有希望.我可以获取 script#foo
标签的内容吗?
Possibly promising. Can I get at the contents of the script#foo
tag?
来源:
<?php
header('Content-type: text/plain');
include 'HTML5/Parser.php';
$html = "<!DOCTYPE html><html><head></head><body><script id='foo'><td></td></script></body></html>";
$d = HTML5_Parser::parse($html);
echo $d->saveHTML();
输出:
<html><head></head><body><script id="foo"><td></td></script></body></html>
推荐答案
我遇到了同样的问题,显然您可以通过将文档加载为 XML 并将其另存为 HTML 来解决这个问题:)
I had the same problem and apparently you can hack your way trough this by loading the document as XML, and save it as HTML :)
$d = new DOMDocument;
$d->loadXML('<script id="foo"><td>bar</td></script>');
echo $d->saveHTML();
当然,标记必须没有错误,loadXML 才能工作.
But of course the markup must be error-free for loadXML to work.
这篇关于DOM 解析器,允许在 <script> 中使用 HTML5 样式的 </标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!