为什么xpath删除html特殊字符? [英] why does xpath remove html special characters?

查看:120
本文介绍了为什么xpath删除html特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么这么做

$html = '<a href="/browse/product.do?cid=1&amp;vid=1&amp;pid=1" class="productItemName">what is going on here</a>';

$dom = new DOMDocument();
$dom->loadhtml($html);
$xpath = new DOMXPath($dom);

$selectors['link'] = '//a/@href';
$links_nodeList = $xpath->query($selectors['link']);

foreach ($links_nodeList as $link) {
    $links[] = $link->nodeValue;
}

echo("<p>links</p>");
echo("<pre>");
print_r($links);
echo("</pre>");

输出

links

Array
(
    [0] => /browse/product.do?cid=1&vid=1&pid=1
)

而不是

links

Array
(
    [0] => /browse/product.do?cid=1&amp;vid=1&amp;pid=1
)

推荐答案

答案很简单

& amp ; 是一种表示XML文档中字符& 的特殊方式。

&amp; is a special way to represent the character "&" in an XML document.

这两个表示相同的字符

当&符号的转义形式输出为文本(而不是XML)时,显示

When the escaped form of the ampersand is output as text (not as XML), showing it as "&" is correct.

@LarsH在他的评论中进一步阐述了< / strong>:

As further elaborated by @LarsH in his comment:


当您说 loadhtml($ html);时,您正在将字符串解析为HTML,
,这意味着将字符实体(如& )解释为
为其表示的字符(如& )。如果您想要将字符串
解释为& ,则需要转义与号,例如
& amp;

when you say loadhtml($html);, you are parsing the string as HTML, which means that character entities (like &amp;) are interpreted into the characters they represent (like &). If you want a string that will be interpreted as &amp;, you need to escape the ampersand, e.g. &amp;amp;

这篇关于为什么xpath删除html特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆