从HTML提取链接 [英] Extract links from HTML

查看:35
本文介绍了从HTML提取链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 <?php$ cont ='< div class ="video-image">< a href =视频/TI-否+问题+什么/_CHheEQe9M8/" title ="23">< img src ="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt ="TI" width ="130" height ="78"/></a>< span class ="video-title">< a href ="video/TI-否+ Matter + What/_CHheEQe9M8/" title ="sdg">无关紧要什么</a></span>< span class ="video-artist">< a href ="video/TI-No + Matter + What/_CHheEQe9M8/" title ="ss" class =省略号"> TI</a></span></div>';如果(preg_match_all('#< a href =([^>] *)"#iU',$ cont,$ arr)){foreach($ arr [1]作为$ value){var_dump($ value);$ cont = preg_replace('#'.preg_quote($ value,'#').'#iU','http://site.com/'.$ value,$ cont);}}echo $ cont; 

返回: http://site.com/http://site.com/http://site.com/video/TI-否+问题+内容/_CHheEQe9M8/

为什么?我想要: http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/怎么做?对不起,英语不好

编辑

  $ dom =新的DOMDocument;$ dom-> loadHTML($ cont);foreach($ dom-> getElementsByTagName('a')作为$ node){$ cont = preg_replace('#'.preg_quote($ node-> getAttribute('href'),'#').'#',"http://site.com/".$ node-> getAttribute('href'),$ cont);}echo $ cont; 

此代码也返回 http://site.com/http://site.com/http://site.com/video/TI-否+内容+内容/_CHheEQe9M8/...

解决方案

  $ dom = new DOMDocument;$ dom-> preserveWhiteSpace = FALSE;$ dom-> loadXml($ xhtml);foreach($ dom-> getElementsByTagName('a')作为$ node){$ node-> setAttribute('href',"http://site.com/".$ node-> getAttribute('href'));}$ dom-> formatOutput = TRUE;echo $ dom-> saveXML($ dom-> documentElement); 

结果:

 < div class ="video-image">< a href ="http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/" title ="23">< img src ="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt ="TI" width ="130" height ="78"/></a>< span class ="video-title">< a href ="http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/" title ="sdg">无关紧要什么</a></span>< span class ="video-artist">< a href ="http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/" title ="ss" class =省略号"> TI</a></span></div> 

    <?php
    $cont = '<div class="video-image">
        <a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="23">
            <img src="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt="TI" width="130" height="78"/>
        </a>
        <span class="video-title"><a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="sdg">No Matter What</a></span>
        <span class="video-artist"><a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="ss" class="ellipsis">TI</a></span>
    </div>';

    if (preg_match_all('#<a href="([^>]*)"#iU', $cont, $arr))
    {
        foreach ($arr[1] as $value)
        {
            var_dump($value);
            $cont = preg_replace('#' . preg_quote($value, '#') . '#iU', 'http://site.com/' . $value, $cont);
        }
    }

    echo $cont;

Returned: http://site.com/http://site.com/http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/

Why? I want : http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/ How to do it? sorry for bad english

EDIT

$dom = new DOMDocument;
    $dom->loadHTML($cont);
    foreach( $dom->getElementsByTagName('a') as $node )
    {
        $cont = preg_replace('#' . preg_quote($node->getAttribute('href'), '#') . '#', "http://site.com/" . $node->getAttribute('href'), $cont);
    }    
    echo $cont;

This code returns http://site.com/http://site.com/http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/ too...

解决方案

$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXml($xhtml);
foreach( $dom->getElementsByTagName('a') as $node )
{
    $node->setAttribute(
        'href', 
        "http://site.com/" . $node->getAttribute('href')
    );
}
$dom->formatOutput = TRUE;
echo $dom->saveXML($dom->documentElement);

Result:

<div class="video-image">
  <a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="23">
    <img src="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt="TI" width="130" height="78"/>
  </a>
  <span class="video-title">
    <a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="sdg">No Matter What</a>
  </span>
  <span class="video-artist">
    <a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="ss" class="ellipsis">TI</a>
  </span>
</div>

这篇关于从HTML提取链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆