PHP DOMDocument如何获取元素? [英] PHP DOMDocument how to get element?

查看:142
本文介绍了PHP DOMDocument如何获取元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想读一个网站的内容,但我有一个问题,我想获得图像,链接这些元素,但我想获得元素他们自己不是元素内容,例如我想得到:我想得到整个元素。



我如何做这个..

  ;?php 

$ ch = curl_init();

curl_setopt($ ch,CURLOPT_URL,http://www.link.com);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);

$ output = curl_exec($ ch);

$ dom = new DOMDocument;
@ $ dom-> loadHTML($ output);

$ items = $ dom-> getElementsByTagName('a');

for($ i = 0; $ i< $ items-> length; $ i ++){
echo $ items-> item($ i) - > nodeValue。 < br />;
}

curl_close($ ch);;
?>


解决方案

您似乎要求 html 的DOMElement?例如。您想要一个包含< a href =http://example.org>链接文字< / a> 的字符串? (请提出您的问题。)

  $ url ='http://example.com '; 
$ dom = new DOMDocument();
$ dom-> loadHTMLFile($ url);

$ anchors = $ dom-> getElementsByTagName('a');

foreach($ anchors as $ a){
//最好的解决方案,但只适用于PHP> = 5.3.6
$ htmlstring = $ dom-> saveHTML ($ a);

//否则你需要序列化为XML,然后修复自我关闭的元素
$ htmlstring = saveHTMLFragment($ a);
echo $ htmlstring,\\\
;
}


function saveHTMLFragment(DOMElement $ e){
$ selfclosingelements = array('>< / area>','> base>','>< / basefont>',
'>< / br>','>< / col>','>< / frame>','> ;< / hr>','>< / img>','>< / input>',
'>< / isindex>','>< / link> ','>< / meta>','>< / param>','>< / source>',
);
//这不是100%可靠,因为它可能输出命名空间声明。
//但是否则,它是额外偏执的工作到至少PHP 5.1
$ html = $ e-> ownerDocument-> saveXML($ e,LIBXML_NOEMPTYTAG);
//如果任何空元素被展开,再折叠它们:
$ html = str_ireplace($ selfclosingelements,'>',$ html);
return $ html;但是,请注意,你所做的是危险的,因为它可能会混合编码。 。最好将输出作为另一个DOMDocument,并使用 importNode()复制所需的节点。或者,使用XSL样式表。


I am trying to read a website's content but i have a problem i want to get images, links these elements but i want to get elements them selves not the element content for instance i want to get that: i want to get that entire element.

How can i do this..

<?php

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, "http://www.link.com");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    $output = curl_exec($ch);

    $dom = new DOMDocument;
    @$dom->loadHTML($output);

    $items = $dom->getElementsByTagName('a');

    for($i = 0; $i < $items->length; $i++) {
        echo $items->item($i)->nodeValue . "<br />";
    }

    curl_close($ch);;
?>

解决方案

You appear to be asking for the serialized html of a DOMElement? E.g. you want a string containing <a href="http://example.org">link text</a>? (Please make your question clearer.)

$url = 'http://example.com';
$dom = new DOMDocument();
$dom->loadHTMLFile($url);

$anchors = $dom->getElementsByTagName('a');

foreach ($anchors as $a) {
    // Best solution, but only works with PHP >= 5.3.6
    $htmlstring = $dom->saveHTML($a);

    // Otherwise you need to serialize to XML and then fix the self-closing elements
    $htmlstring = saveHTMLFragment($a);
    echo $htmlstring, "\n";
}


function saveHTMLFragment(DOMElement $e) {
    $selfclosingelements = array('></area>', '></base>', '></basefont>',
        '></br>', '></col>', '></frame>', '></hr>', '></img>', '></input>',
        '></isindex>', '></link>', '></meta>', '></param>', '></source>',
    );
    // This is not 100% reliable because it may output namespace declarations.
    // But otherwise it is extra-paranoid to work down to at least PHP 5.1
    $html = $e->ownerDocument->saveXML($e, LIBXML_NOEMPTYTAG);
    // in case any empty elements are expanded, collapse them again:
    $html = str_ireplace($selfclosingelements, '>', $html);
    return $html;
}

However, note that what you are doing is dangerous because it could potentially mix encodings. It is better to have your output as another DOMDocument and use importNode() to copy the nodes you want. Alternatively, use an XSL stylesheet.

这篇关于PHP DOMDocument如何获取元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆