为什么xpath删除html特殊字符? [英] why does xpath remove html special characters?
问题描述
为什么这么做
$html = '<a href="/browse/product.do?cid=1&vid=1&pid=1" class="productItemName">what is going on here</a>';
$dom = new DOMDocument();
$dom->loadhtml($html);
$xpath = new DOMXPath($dom);
$selectors['link'] = '//a/@href';
$links_nodeList = $xpath->query($selectors['link']);
foreach ($links_nodeList as $link) {
$links[] = $link->nodeValue;
}
echo("<p>links</p>");
echo("<pre>");
print_r($links);
echo("</pre>");
输出
links
Array
(
[0] => /browse/product.do?cid=1&vid=1&pid=1
)
而不是
links
Array
(
[0] => /browse/product.do?cid=1&vid=1&pid=1
)
?
推荐答案
答案很简单:
& amp ;
是一种表示XML文档中字符&
的特殊方式。
&
is a special way to represent the character "&"
in an XML document.
这两个表示相同的字符。
当&符号的转义形式输出为文本(而不是XML)时,显示
When the escaped form of the ampersand is output as text (not as XML), showing it as "&"
is correct.
@LarsH在他的评论中进一步阐述了< / strong>:
As further elaborated by @LarsH in his comment:
当您说
loadhtml($ html)
;时,您正在将字符串解析为HTML,
,这意味着将字符实体(如&
)解释为
为其表示的字符(如&
)。如果您想要将字符串
解释为&
,则需要转义与号,例如
。& amp;
when you say
loadhtml($html)
;, you are parsing the string as HTML, which means that character entities (like&
) are interpreted into the characters they represent (like&
). If you want a string that will be interpreted as&
, you need to escape the ampersand, e.g.&amp;
这篇关于为什么xpath删除html特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!