DOMDocument :: loadXML与HTML实体 [英] DOMDocument::loadXML vs. HTML Entities

查看:101
本文介绍了DOMDocument :: loadXML与HTML实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前在使用XHTML读取时遇到问题,因为XML解析器无法识别HTML字符实体,因此:

I currently have a problem reading in XHTML as the XML parser doesn't recognise HTML character entities so:

<?php
$text = <<<EOF
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Entities are Causing Me Problems</title>
  </head>
  <body>
    <p>Copyright &copy; 2010 Some Bloke</p>
  </body>
</html>
EOF;

$imp = new DOMImplementation ();
$html5 = $imp->createDocumentType ('html', '', '');
$doc = $imp->createDocument ('http://www.w3.org/1999/xhtml', 'html', $html5);

$doc->loadXML ($text);

header ('Content-Type: application/xhtml+xml; charset: utf-8');
echo $doc->saveXML ();

结果:

警告:DOMDocument :: loadXML()[domdocument.loadxml]:实体'copy'未在实体中定义,第8行 testing.php 在行< b> 19

Warning: DOMDocument::loadXML() [domdocument.loadxml]: Entity 'copy' not defined in Entity, line: 8 in testing.php on line 19

如何在允许自己将页面作为XHTML5的情况下修复此问题?

How can I fix this while allowing myself to serve pages as XHTML5?

推荐答案

XHTML5没有DTD,因此您不能在其中使用旧式HTML命名实体,因为没有文档类型定义来告诉解析器为这种语言命名的实体是什么。 (除了预定义的XML实体& lt; & amp; & ;& gt; ...和& apos; ,虽然你通常不想使用它。)

XHTML5 does not have a DTD, so you may not use the old-school HTML named entities in it, as there is no document type definition to tell the parser what the named entities are for this language. (Except for the predefined XML entities &lt;, &amp;, &quot; and &gt;... and &apos;, though you generally don't want to use that).

而是使用数字字符引用(&#169; )或者更好的是,只是一个简单的未编码©字符(UTF-8;记得包含< meta> 元素表示字符集为非XML解析器。)

Instead use a numeric character reference (&#169;) or, better, just a plain unencoded © character (in UTF-8; remember to include the <meta> element to signify the character set to non-XML parsers).

这篇关于DOMDocument :: loadXML与HTML实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆