PHP“漂亮的打印” HTML(不整洁) [英] PHP "pretty print" HTML (not Tidy)

查看:141
本文介绍了PHP“漂亮的打印” HTML(不整洁)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在PHP中使用DOM扩展来构建一些HTML文档,并且我希望输出格式很好(使用新的行和缩进),以便它可读,但是,从我做的很多测试:
$ b


  1. formatOutput = true根本无法使用saveHTML(),仅saveXML()

  2. 即使我使用了saveXML(),它仍然只能在通过DOM创建的元素上工作,而不是在loadHTML()中包含的元素,即使是preserveWhiteSpace = false也是如此。

如果有人知道,我真的很想知道他们是如何工作的。



<所以,我有一个DOM文档,并且使用saveHTML()来输出HTML。因为它来自DOM,我知道这是有效的,没有必要整理或以任何方式验证。



我只是寻找一种方法从我从DOM扩展接收到的输出中获得格式良好的输出。



NB。正如你可能已经猜到的,我不想使用Tidy扩展作为a)它也做了很多我也需要它(标记已经是有效的)和b)它实际上对HTML内容进行了改变(例如HTML 5的文档类型和一些元素)。



后续操作

好的,在下面的答案帮助下,我已经找出了为什么DOM扩展不工作。虽然给出的例子工作,它仍然没有与我的代码工作。在这个评论的帮助下,我发现如果你有任何其中isWhitespaceInElementContent()为true的文本节点不会在该点之后应用格式。无论preserveWhiteSpace是否为假,都会发生这种情况。解决方法是删除所有这些节点(尽管我不确定这是否会对实际内容产生不利影响)。 解决方案

你是对的,似乎没有缩进的HTML(其他人也困惑)。即使加载了代码,XML也可以工作。

 <?php 
函数tidyHTML($ buffer){
//将我们的文档加载到DOM对象中
$ dom = new DOMDocument();
//我们需要很好的输出
$ dom-> preserveWhiteSpace = false;
$ dom-> loadHTML($ buffer);
$ dom-> formatOutput = true;
return($ dom-> saveHTML());
}

//开始输出缓冲,使用我们的
//回调函数来格式化输出。
ob_start(tidyHTML);

?>
< html>
< head>
< title> foo bar< / title>< meta name =barvalue =foo>< body>< h1> bar foo< / h1>< p>苹果与橘子。< / p>< / body>< / html>
<?php
//这将被隐式调用,但是我们将
//手动调用来说明这一点。
ob_end_flush();
?>

结果:

 <!DOCTYPE html PUBLIC -  // W3C // DTD HTML 4.0 Transitional // ENhttp://www.w3.org/TR/REC-html40/loose.dtd\"> 
< html>
< head>
< title> foo bar< / title>
< meta name =barvalue =foo>
< / head>
< body>
< h1> bar foo< / h1>
< p>就像比较苹果和橘子一样。< / p>
< / body>
< / html>

与saveXML()相同...

 <?xml version =1.0standalone =yes?> 
<!DOCTYPE html PUBLIC - // W3C // DTD HTML 4.0 Transitional // ENhttp://www.w3.org/TR/REC-html40/loose.dtd\">
< html>
< head>
< title> foo bar< / title>
< meta name =barvalue =foo/>
< / head>
< body>
< h1> bar foo< / h1>
< p>就像比较苹果和橘子一样。< / p>
< / body>
< / html>

可能忘记在loadHTML之前设置preserveWhiteSpace = false?


免责声明:我从 tyson clugg / php手册评论。懒我。







更新:我现在记得几年前,我尝试了同样的事情,遇到了同样的问题。我通过应用一个肮脏的解决方法(不是性能关键)解决了这个问题:我只是以某种方式在SimpleXML和DOM之间转换,直到问题消失。我想转换摆脱了这些节点。也许用dom载入,用 simplexml_import_dom 导入,然后输出字符串,再用DOM解析这个,然后然后打印出来。据我所知,这工作(但它真的很慢)。


I'm using the DOM extension in PHP to build some HTML documents, and I want the output to be formatted nicely (with new lines and indentation) so that it's readable, however, from the many tests I've done:

  1. "formatOutput = true" doesn't work at all with saveHTML(), only saveXML()
  2. Even if I used saveXML(), it still only works on elements created via the DOM, not elements that are included with loadHTML(), even with "preserveWhiteSpace = false"

If anyone knows differently I'd really like to know how they got it to work.

So, I have a DOM document, and I'm using saveHTML() to output the HTML. As it's coming from the DOM I know it is valid, there's no need to "Tidy" or validate it in any way.

I'm simply looking for a way to get nicely formatted output from the output I receive from the DOM extension.

NB. As you may have guessed, I don't want to use the Tidy extension as a) it does a lot more that I need it too (the markup is already valid) and b) it actually makes changes to the HTML content (such as the HTML 5 doctype and some elements).

Follow Up:

OK, with the help of the answer below I've worked out why the DOM extension wasn't working. Although the given example works, it still wasn't working with my code. With the help of this comment I found that if you have any text nodes where isWhitespaceInElementContent() is true no formatting will be applied beyond that point. This happens regardless of whether or not preserveWhiteSpace is false. The solution is to remove all of these nodes (although I'm not sure if this may have adverse effects on the actual content).

解决方案

you're right, there seems to be no indentation for HTML (others are also confused). XML works, even with loaded code.

<?php
function tidyHTML($buffer) {
    // load our document into a DOM object
    $dom = new DOMDocument();
    // we want nice output
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($buffer);
    $dom->formatOutput = true;
    return($dom->saveHTML());
}

// start output buffering, using our nice
// callback function to format the output.
ob_start("tidyHTML");

?>
<html>
    <head>
    <title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
<?php
// this will be called implicitly, but we'll
// call it manually to illustrate the point.
ob_end_flush();
?>

result:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>foo bar</title>
<meta name="bar" value="foo">
</head>
<body>
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>
</body>
</html>

the same with saveXML() ...

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <head>
    <title>foo bar</title>
    <meta name="bar" value="foo"/>
  </head>
  <body>
    <h1>bar foo</h1>
    <p>It's like comparing apples to oranges.</p>
  </body>
</html>

probably forgot to set preserveWhiteSpace=false before loadHTML?

disclaimer: i stole most of the demo code from tyson clugg/php manual comments. lazy me.


UPDATE: i now remember some years ago i tried the same thing and ran into the same problem. i fixed this by applying a dirty workaround (wasn't performance critical): i just somehow converted around between SimpleXML and DOM until the problem vanished. i suppose the conversion got rid of those nodes. maybe load with dom, import with simplexml_import_dom, then output the string, parse this with DOM again and then printed it pretty. as far as i remember this worked (but it was really slow).

这篇关于PHP“漂亮的打印” HTML(不整洁)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆