DOM解析器，允许HTML5样式< / in< script>标签 [英] DOM parser that allows HTML5-style </ in <script> tag

查看：99 发布时间：2017/6/24 21:03:46 php dom html5

本文介绍了DOM解析器，允许HTML5样式< / in< script>标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

更新： html5lib （底层的问题）似乎很接近，我只需要提高我对如何使用的理解。 p>

我正在尝试为PHP 5.3找到一个HTML5兼容的DOM解析器。特别是，我需要在脚本代码中访问以下类似HTML的CDATA：

 < script type =text / x-jquery-tmplid =foo> 
< table>< tr>< td> $ {name}< / td>< / tr>< / table> 
< / script>

大多数解析器将提前结束解析，因为HTML 4.01 结束脚本标签解析找到ETAGO（）内的< script> 标签。但是，HTML5 允许< / 在< / script> 之前。我迄今为止尝试的所有解析器都失败了，或者他们的文档记录不全，我没有想到它们是否工作。

我的要求：

真正的解析器，而不是正则表达式黑客。

加载完整页面或HTML片段的能力

可以撤销脚本内容，按标签的id属性进行选择。

输入：

 < script id =foo>< td> bar< / td> < /脚本>

输出失败的示例（无结束< / td> ）：

 < script id =foo>< td> bar< / script>

某些解析器及其结果：

DOMDocument （failed）

资料来源：

 <？php 
 
 header（'Content-type：text / plain'）; 
 $ d = new DOMDocument; 
 $ d-> loadHTML（'< script id =foo>< td> bar< / td>< / script>'）; 
 echo $ d-> saveHTML（）;

输出：

 code>警告：DOMDocument :: loadHTML（）：意外的结束标签：实体中的td，行：1在/home/adam/public_html/2010/10/26/dom.php第5行
< ;！DOCTYPE html PUBLIC -  // W3C // DTD HTML 4.0 Transitional // ENhttp://www.w3.org/TR/REC-html40/loose.dtd\"> 
< html>< head>< script id =foo>< td> bar< / script>< / head>< / html>

FluentDOM （failed）

资料来源：

 <？php 
 
 header（'Content-type：text / plain'）; 
 require_once'FluentDOM / src / FluentDOM.php'; 
 $ html =< html>< head>< / head>< body>< script id ='foo'>< td>< / td>< / script>< ; /体>< / HTML>中; 
 echo FluentDOM（$ html，'text / html'）;

输出：

 code><！DOCTYPE html PUBLIC -  // W3C // DTD HTML 4.0 Transitional // ENhttp://www.w3.org/TR/REC-html40/loose.dtd\"> 
< html>< head>< / head>< body>< script id =foo>< td>< / script>< / body>< / html>

phpQuery （failed）

资料来源：

 <？php 
 
标题（'Content-type：text / plain'）; 
 
 require_once'phpQuery.php'; 
 
 phpQuery :: newDocumentHTML（<< EOF 
< script type =text / x-jquery-tmplid =foo> 
< ; td> test< / td> 
< / script> 
 EOF 
）;

echo（string）pq（'＃foo'）;

输出：

 < script type =text / x-jquery-tmplid =foo > 
< td> test 
< / script>

html5lib （通过）

可能有希望。我可以得到脚本＃foo 标签的内容吗？

资料来源：

 <？php 
 
标题（'Content-type：text / plain'）; 
 
包含HTML5 / Parser.php; 
 
 $ html =<！DOCTYPE html>< html>< head>< / head>< body>< script id ='foo'>< td>< ; / TD>< /脚本>< /体>< / HTML>中; 
 $ d = HTML5_Parser :: parse（$ html）; 
 
 echo $ d-> saveHTML（）;

输出：

 code>< html>< head>< / head>< body>< script id =foo>< td>< / td>< / script>< / body> ;< / HTML>

解决方案

我有同样的问题，显然你可以黑客通过将文档加载为XML，并将其保存为HTML：）

  $ d = new DOMDocument; 
 $ d-> loadXML（'< script id =foo>< td> bar< / td>< / script>'）; 
 echo $ d-> saveHTML（）;

但是当然，对于loadXML，标记必须是无错误的。

Update: html5lib (bottom of question) seems to get close, I just need to improve my understanding of how it's used.

I am attempting to find an HTML5-compatible DOM parser for PHP 5.3. In particular, I need to access the following HTML-like CDATA within a script tag:

<script type="text/x-jquery-tmpl" id="foo">
    <table><tr><td>${name}</td></tr></table>
</script>

Most parsers will end parsing prematurely because HTML 4.01 ends script tag parsing when it finds ETAGO (</) inside a <script> tag. However, HTML5 allows for </ before </script>. All of the parsers I have tried so far have either failed, or they are so poorly documented that I haven't figured out if they work or not.

My requirements:

Real parser, not regex hacks.
Ability to load full pages or HTML fragments.
Ability to pull script contents back out, selecting by the tag's id attribute.

Input:

<script id="foo"><td>bar</td></script>

Example of failing output (no closing </td>):

<script id="foo"><td>bar</script>

Some parsers and their results:

DOMDocument (fails)

Source:

<?php

header('Content-type: text/plain');
$d = new DOMDocument;
$d->loadHTML('<script id="foo"><td>bar</td></script>');
echo $d->saveHTML();

Output:

Warning: DOMDocument::loadHTML(): Unexpected end tag : td in Entity, line: 1 in /home/adam/public_html/2010/10/26/dom.php on line 5
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script id="foo"><td>bar</script></head></html>

FluentDOM (fails)

Source:

<?php

header('Content-type: text/plain');
require_once 'FluentDOM/src/FluentDOM.php';
$html = "<html><head></head><body><script id='foo'><td></td></script></body></html>";
echo FluentDOM($html, 'text/html');

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head></head><body><script id="foo"><td></script></body></html>

phpQuery (fails)

Source:

<?php

header('Content-type: text/plain');

require_once 'phpQuery.php';

phpQuery::newDocumentHTML(<<<EOF
<script type="text/x-jquery-tmpl" id="foo">
<td>test</td>
</script>
EOF
);

echo (string)pq('#foo');

Output:

<script type="text/x-jquery-tmpl" id="foo">
<td>test
</script>

html5lib (passes)

Possibly promising. Can I get at the contents of the script#foo tag?

Source:

<?php

header('Content-type: text/plain');

include 'HTML5/Parser.php';

$html = "<!DOCTYPE html><html><head></head><body><script id='foo'><td></td></script></body></html>";
$d = HTML5_Parser::parse($html);

echo $d->saveHTML();

Output:

<html><head></head><body><script id="foo"><td></td></script></body></html>

解决方案

I had the same problem and apparently you can hack your way trough this by loading the document as XML, and save it as HTML :)

$d = new DOMDocument;
$d->loadXML('<script id="foo"><td>bar</td></script>');
echo $d->saveHTML();

But of course the markup must be error-free for loadXML to work.

这篇关于DOM解析器，允许HTML5样式< / in< script>标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

DOM解析器，允许HTML5样式< / in< script>标签 [英] DOM parser that allows HTML5-style </ in <script> tag

问题描述

DOMDocument （failed）

FluentDOM （failed）

phpQuery （failed）

html5lib （通过）

DOMDocument (fails)

FluentDOM (fails)

phpQuery (fails)

html5lib (passes)

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

DOM解析器，允许HTML5样式&lt; / in&lt; script&gt;标签 [英] DOM parser that allows HTML5-style &lt;/ in &lt;script&gt; tag

问题描述

DOMDocument （failed）

FluentDOM （failed）

phpQuery （failed）

html5lib （通过）

DOMDocument (fails)

FluentDOM (fails)

phpQuery (fails)

html5lib (passes)

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

DOM解析器，允许HTML5样式< / in< script>标签 [英] DOM parser that allows HTML5-style </ in <script> tag

登录关闭