为什么PHP DOM不包括在自我关闭标签上的斜线? [英] Why doesn't PHP DOM include slash on self closing tags?

查看:86
本文介绍了为什么PHP DOM不包括在自我关闭标签上的斜线?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用PHP的DOM加载一个html模板,修改它并输出它。最近我发现自动关闭(空)标签不包括关闭斜杠,即使是模板文件。



例如

 <!DOCTYPE html PUBLIC -  // W3C // DTD XHTML 1.0 Strict // EN`http://www.w3.org/TR /xhtml1/DTD/xhtml1-strict.dtd\"> 
< html>
< meta http-equiv =Content-Typecontent =text / html; charset = utf-8/>
< / head>
< body>
< / body>
< / html>

成为:

 code><!DOCTYPE html PUBLIC -  // W3C // DTD XHTML 1.0 Strict // ENhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"> ; 
< html>
< meta http-equiv =Content-Typecontent =text / html; charset = utf-8>
< / head>
< body>
< / body>
< / html>

这是一个错误或设置,还是doctype问题?

解决方案

DOMDocument-> saveHTML()将您的XML DOM信息集并入旧书HTML,而不是XML。您不应该将 saveHTML()与XHTML doctype一起使用,因为其输出不会格式正确。



如果您使用 saveXML(),那么您将获得正确的XHTML。如果您提供了一个 Content-Type:application / xhtml + xml 标题,则可以将此XML输出提供给符合标准的浏览器。但不幸的是IE6-8将无法读取,因为他们仍然只能处理旧的HTML,在 text / html 媒体类型下。



通常的折衷解决方案是提供 text / html ,并使用XHTML附录C中所述的HTML兼容XHTML 1.0规格但是遗憾的是没有PHP DOMDocument-> saveXHTML()方法来生成正确的输出。



有些事情你可以劝说 saveXML()为某些常见的情况产生HTML兼容的输出。主要的是您必须确保只有HTML4定义的元素具有 EMPTY 内容模型(< img> < br> 等)实际上有空的内容,导致自动关闭语法(< img /> )被使用。其他元素不能使用自动关闭语法,因此如果它们为空,您应该在其文本内容中放置一个空格,以阻止它们:

 < script src =x.js/> <  - 不好,混淆HTML解析器和中断页
< script src =x.js> < /脚本> < - 罚款

另一个要注意的是处理内联< style> 元素,这是XHTML中的普通元素,但特殊 CDATA -content元素在HTML中。需要一些 / *<![CDATA [* / ... / *]]> * / 包装,以使任何 & 其中的字符行为大部分是一致的,虽然要注意,你仍然必须避免]]> 序列。



如果你想真正做到正确,你会有编写自己的HTML兼容-XHTML序列号。长期看来,这可能是一个更好的选择。但是对于小型简单的案例,黑客攻击您的输入,使其不包含与XML序列号的另一端出现的不符合HTML的任何内容可能是快速解决方案。


I have been using PHP's DOM to load an html template, modify it and output it. Recently I discovered that self-closing (empty) tags don't include a closing slash, even though the template file did.

e.g.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"`"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
</body>
</html>

becomes:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
</body>
</html>

Is this a bug or a setting, or a doctype issue?

解决方案

DOMDocument->saveHTML() takes your XML DOM infoset and writes it out as old-school HTML, not XML. You should not use saveHTML() together with an XHTML doctype, as its output won't be well-formed XML.

If you use saveXML() instead, you'll get proper XHTML. It's fine to serve this XML output to standards-compliant browsers if you give it a Content-Type: application/xhtml+xml header. But unfortunately IE6-8 won't be able to read that, as they can still only handle old-school HTML, under the text/html media type.

The usual compromise solution is to serve text/html and use ‘HTML-compatible XHTML’ as outlined in Appendix C of the XHTML 1.0 spec. But sadly there is no PHP DOMDocument->saveXHTML() method to generate the correct output for this.

There are some things you can do to persuade saveXML() to produce HTML-compatible output for some common cases. The main one is that you have to ensure that only elements defined by HTML4 as having an EMPTY content model (<img>, <br> etc) actually do have empty content, causing the self-closing syntax (<img/>) to be used. Other elements must not use the self-closing syntax, so if they're empty you should put a space in their text content to stop them being so:

<script src="x.js"/>           <-- no good, confuses HTML parser and breaks page
<script src="x.js"> </script>  <-- fine

The other one to look out for is handling of the inline <script> and <style> elements, which are normal elements in XHTML but special CDATA-content elements in HTML. Some /*<![CDATA[*/.../*]]>*/ wrapping is required to make any < or & characters inside them behave mostly-consistently, though note you still have to avoid the ]]> and </ sequences.

If you want to really do it properly you would have to write your own HTML-compatible-XHTML serialiser. Long-term that would probably be a better option. But for small simple cases, hacking your input so that it doesn't contain anything that would come out the other end of an XML serialiser as incompatible with HTML is probably the quick solution.

That or just suck it up and live with old-school non-XML HTML, obviously.

这篇关于为什么PHP DOM不包括在自我关闭标签上的斜线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆