在允许DTD加载之前检查恶意XML? [英] Check for malicious XML before allowing DTD loading?
问题描述
从libxml 2.9开始,在解析XML时已禁止加载外部实体,以防止 XXE攻击。
Since libxml 2.9, loading external entities has been disabled when parsing XML, to prevent XXE attacks.
在这种情况下,为了能够在使用PHP的DOMDocument解析XML时加载DTD文件, LIBXML_DTDLOAD
必须指定。
In that case, to be able to load a DTD file when parsing the XML with PHP's DOMDocument, LIBXML_DTDLOAD
must be specified.
什么是验证仅 预期DTD会很好的好方法?在启用 LIBXML_DTDLOAD
之前被加载?
What would be a good way to verify that only the expected DTD will be loaded, before enabling LIBXML_DTDLOAD
?
我可以想到的一种方法(如下面的示例代码所示)将保持禁用实体加载,解析XML文件一次,检查DOCTYPE声明是否符合预期,然后在启用实体加载的情况下再次解析XML。足够吗?
One approach I can think of (as shown in the example code below) would be to keep entity loading disabled, parse the XML file once, check that the DOCTYPE declaration is as expected, then parse the XML again with entity loading enabled. Would that be sufficient?
<?php
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article/>
XML;
// entity loading disabled
libxml_disable_entity_loader();
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD); // PHP Warning: DOMDocument::load(): I/O warning : failed to load external entity
print $doc->doctype->systemId; // http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd
// entity loading enabled
libxml_disable_entity_loader(false);
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD);
print $doc->doctype->systemId; // http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd
推荐答案
在启用
LIBXML_DTDLOAD $ c $之前,一种验证仅加载预期DTD的好方法是什么c>?
如果要过滤(白名单)预期的DTD,可以通过返回<$ c来禁用所有其他DDT您自己的可调用中的$ c> NULL ,已通过 libxml_set_external_entity_loader
。
If you want to filter (whitelist) expected DTDs, you can do so by disabling all others by returning NULL
from your own callable that has been set as external entity loader via libxml_set_external_entity_loader
.
即,您将使用 LIBXML_DTDLOAD
标志,然后解析为 资源句柄 在功能中,以防DTD被列入白名单。否则,您返回 NULL
。
That is, you would use the LIBXML_DTDLOAD
flag and then resolve to a resource handle in your function in case the DTD is white-listed. In case not, you return said NULL
.
<?php
/**
* @link http://stackoverflow.com/q/24526493/367456
*/
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article/>
XML;
/* own entity loader */
libxml_set_external_entity_loader(function() {
var_dump(func_get_args()); // just for demonstrating purposes
return NULL;
});
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD);
echo "----\n";
/* restore default entity loader */
libxml_set_external_entity_loader(NULL);
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD);
示例输出:
array(3) {
[0]=>
string(66) "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN"
[1]=>
string(66) "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd"
[2]=>
array(4) {
["directory"]=>
string(1) "/"
["intSubName"]=>
string(7) "article"
["extSubURI"]=>
string(66) "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd"
["extSubSystem"]=>
string(66) "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN"
}
}
Warning: DOMDocument::loadXML(): Failed to load external entity "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" in Entity, line: 2 in /in/jemmH on line 18
----
Warning: DOMDocument::loadXML(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /in/jemmH on line 25
Warning: DOMDocument::loadXML(http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd): failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /in/jemmH on line 25
Notice: DOMDocument::loadXML(): failed to load external entity "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd" in Entity, line: 2 in /in/jemmH on line 25
这篇关于在允许DTD加载之前检查恶意XML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!