LIBXML_NOENT是做什么的(为什么不叫LIBXML_ENT)? [英] What does LIBXML_NOENT do (and why isn't it called LIBXML_ENT)?

查看:423
本文介绍了LIBXML_NOENT是做什么的(为什么不叫LIBXML_ENT)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在PHP中,可以将可选参数传递给各种XML解析器,其中之一是LIBXML_NOENT. 文档可以这样说:

LIBXML_NOENT(整数)
替代实体

Substitute entities信息不是很丰富(什么实体?何时替换它们?).但是我认为假设NOENTNO_ENTITIESNO_EXTERNAL_ENTITIES的缩写是公平的,因此对我来说,似乎很公平的假设是,该标志禁用了对(外部)实体的解析.

但这确实不是 情况:

$xml = '<!DOCTYPE root [<!ENTITY c PUBLIC "bar" "/etc/passwd">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOENT);
echo $dom->textContent;

结果是回显了/etc/passwd的内容.如果没有LIBXML_NOENT参数,则不是这种情况.

对于非外部实体,该标志似乎没有任何作用.示例:

$xml = '<!DOCTYPE root [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->textContent;

此代码的结果为"TEST",带有和不带有LIBXML_NOENT.

该标志似乎对诸如&lt;之类的预定义实体没有任何影响.

所以我的问题是:

  • LIBXML_NOENT标志的作用是什么?
  • 为什么叫LIBXML_NOENT?这是什么缩写?LIBXML_ENTLIBXML_PARSE_EXTERNAL_ENTITIES是否更合适?
  • 是否有一个标志实际上阻止了所有实体的解析?

解决方案

问:LIBXML_NOENT标志的作用是什么?

该标志允许替换XML字符实体引用(无论是否外部).

问:为什么叫LIBXML_NOENT?它的缩写是什么?LIBXML_ENT或LIBXML_PARSE_EXTERNAL_ENTITIES是否更合适?

这个名字确实令人误解.我认为NOENT只是意味着已解析文档的节点树将不包含任何实体节点,因此解析器将替换实体.如果不使用NOENT,则解析器将创建 DOMEntityReference 节点以进行实体引用.

问:是否有一个标志实际上阻止了所有实体的解析?

LIBXML_NOENT启用所有实体引用的替换.如果您不希望扩展实体,只需省略该标志.例如

$xml = '<!DOCTYPE test [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->saveXML();

打印

 <?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY c "TEST">
]>
<test>&c;</test>
 

似乎textContent会自行替换实体,这可能是PHP绑定的特殊之处.如果没有LIBXML_NOENT,它将导致内部和外部实体的行为不同,因为不会加载内部和外部实体.

In PHP, one can pass optional arguments to various XML parsers, one of them being LIBXML_NOENT. The documentation has this to say about it:

LIBXML_NOENT (integer)
Substitute entities

Substitute entities isn't very informative (what entities? when are they substituted?). But I think it's fair to assume that NOENT is short for NO_ENTITIES or NO_EXTERNAL_ENTITIES, so to me it seems to be a fair assumption that this flag disables the parsing of (external) entities.

But that is indeed not the case:

$xml = '<!DOCTYPE root [<!ENTITY c PUBLIC "bar" "/etc/passwd">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOENT);
echo $dom->textContent;

The result is that the content of /etc/passwd is echoed. Without the LIBXML_NOENT argument this is not the case.

For non-external entities, the flag doesn't seem to have any effect. Example:

$xml = '<!DOCTYPE root [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->textContent;

The result of this code is "TEST", with and without LIBXML_NOENT.

The flag doesn't seem to have any effect on pre-defined entities such as &lt;.

So my questions are:

  • What exactly does the LIBXML_NOENT flag do?
  • Why is it called LIBXML_NOENT? What is it short for, and wouldn't LIBXML_ENT or LIBXML_PARSE_EXTERNAL_ENTITIES be a better fit?
  • Is there a flag that actually prevents the parsing of all entities?

解决方案

Q: What exactly does the LIBXML_NOENT flag do?

The flag enables the substitution of XML character entity references, external or not.

Q: Why is it called LIBXML_NOENT? What is it short for, and wouldn't LIBXML_ENT or LIBXML_PARSE_EXTERNAL_ENTITIES be a better fit?

The name is indeed misleading. I think that NOENT simply means that the node tree of the parsed document won't contain any entity nodes, so the parser will substitute entities. Without NOENT, the parser creates DOMEntityReference nodes for entity references.

Q: Is there a flag that actually prevents the parsing of all entities?

LIBXML_NOENT enables the substitution of all entity references. If you don't want entities to be expanded, simply omit the flag. For example

$xml = '<!DOCTYPE test [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->saveXML();

prints

<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY c "TEST">
]>
<test>&c;</test>

It seems that textContent replaces entities on its own which might be a peculiarity of the PHP bindings. Without LIBXML_NOENT, it leads to different behavior for internal and external entities because the latter won't be loaded.

这篇关于LIBXML_NOENT是做什么的(为什么不叫LIBXML_ENT)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆