DOM 解析器,允许在 <script> 中使用 HTML5 样式的 &lt;/标签 [英] DOM parser that allows HTML5-style &lt;/ in &lt;script&gt; tag

查看:19
本文介绍了DOM 解析器,允许在 <script> 中使用 HTML5 样式的 &lt;/标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:html5lib(问题底部)似乎很接近了,我只需要提高我对它的使用方式的理解.

Update: html5lib (bottom of question) seems to get close, I just need to improve my understanding of how it's used.

我正在尝试为 PHP 5.3 寻找与 HTML5 兼容的 DOM 解析器.特别是,我需要在脚本标记中访问以下类似 HTML 的 CDATA:

I am attempting to find an HTML5-compatible DOM parser for PHP 5.3. In particular, I need to access the following HTML-like CDATA within a script tag:

<script type="text/x-jquery-tmpl" id="foo">
    <table><tr><td>${name}</td></tr></table>
</script>

大多数解析器将过早结束解析,因为 HTML 4.01 当它在 之前的 code>.到目前为止,我尝试过的所有解析器要么都失败了,要么它们的文档很差,以至于我不知道它们是否有效.

Most parsers will end parsing prematurely because HTML 4.01 ends script tag parsing when it finds ETAGO (</) inside a <script> tag. However, HTML5 allows for </ before </script>. All of the parsers I have tried so far have either failed, or they are so poorly documented that I haven't figured out if they work or not.

我的要求:

  1. 真正的解析器,而不是正则表达式.
  2. 能够加载完整页面或 HTML 片段.
  3. 能够拉回脚本内容,通过标签的 id 属性选择.
  1. Real parser, not regex hacks.
  2. Ability to load full pages or HTML fragments.
  3. Ability to pull script contents back out, selecting by the tag's id attribute.

输入:

<script id="foo"><td>bar</td></script>

失败输出示例(没有关闭</td>):

Example of failing output (no closing </td>):

<script id="foo"><td>bar</script>

一些解析器及其结果:


来源:

<?php

header('Content-type: text/plain');
$d = new DOMDocument;
$d->loadHTML('<script id="foo"><td>bar</td></script>');
echo $d->saveHTML();

输出:

Warning: DOMDocument::loadHTML(): Unexpected end tag : td in Entity, line: 1 in /home/adam/public_html/2010/10/26/dom.php on line 5
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script id="foo"><td>bar</script></head></html>


来源:

<?php

header('Content-type: text/plain');
require_once 'FluentDOM/src/FluentDOM.php';
$html = "<html><head></head><body><script id='foo'><td></td></script></body></html>";
echo FluentDOM($html, 'text/html');

输出:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head></head><body><script id="foo"><td></script></body></html>


来源:

<?php

header('Content-type: text/plain');

require_once 'phpQuery.php';

phpQuery::newDocumentHTML(<<<EOF
<script type="text/x-jquery-tmpl" id="foo">
<td>test</td>
</script>
EOF
);

echo (string)pq('#foo');

echo (string)pq('#foo');

输出:

<script type="text/x-jquery-tmpl" id="foo">
<td>test
</script>


可能有希望.我可以获取 script#foo 标签的内容吗?

Possibly promising. Can I get at the contents of the script#foo tag?

来源:

<?php

header('Content-type: text/plain');

include 'HTML5/Parser.php';

$html = "<!DOCTYPE html><html><head></head><body><script id='foo'><td></td></script></body></html>";
$d = HTML5_Parser::parse($html);

echo $d->saveHTML();

输出:

<html><head></head><body><script id="foo"><td></td></script></body></html>

推荐答案

我遇到了同样的问题,显然您可以通过将文档加载为 XML 并将其另存为 HTML 来解决这个问题:)

I had the same problem and apparently you can hack your way trough this by loading the document as XML, and save it as HTML :)

$d = new DOMDocument;
$d->loadXML('<script id="foo"><td>bar</td></script>');
echo $d->saveHTML();

当然,标记必须没有错误,loadXML 才能工作.

But of course the markup must be error-free for loadXML to work.

这篇关于DOM 解析器,允许在 <script> 中使用 HTML5 样式的 &lt;/标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆