从HTML提取链接 [英] Extract links from HTML
本文介绍了从HTML提取链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
<?php$ cont ='< div class ="video-image">< a href =视频/TI-否+问题+什么/_CHheEQe9M8/" title ="23">< img src ="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt ="TI" width ="130" height ="78"/></a>< span class ="video-title">< a href ="video/TI-否+ Matter + What/_CHheEQe9M8/" title ="sdg">无关紧要什么</a></span>< span class ="video-artist">< a href ="video/TI-No + Matter + What/_CHheEQe9M8/" title ="ss" class =省略号"> TI</a></span></div>';如果(preg_match_all('#< a href =([^>] *)"#iU',$ cont,$ arr)){foreach($ arr [1]作为$ value){var_dump($ value);$ cont = preg_replace('#'.preg_quote($ value,'#').'#iU','http://site.com/'.$ value,$ cont);}}echo $ cont;
返回: http://site.com/http://site.com/http://site.com/video/TI-否+问题+内容/_CHheEQe9M8/
>
为什么?我想要: http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/
怎么做?对不起,英语不好
编辑
$ dom =新的DOMDocument;$ dom-> loadHTML($ cont);foreach($ dom-> getElementsByTagName('a')作为$ node){$ cont = preg_replace('#'.preg_quote($ node-> getAttribute('href'),'#').'#',"http://site.com/".$ node-> getAttribute('href'),$ cont);}echo $ cont;
此代码也返回 http://site.com/http://site.com/http://site.com/video/TI-否+内容+内容/_CHheEQe9M8/
...
解决方案
$ dom = new DOMDocument;$ dom-> preserveWhiteSpace = FALSE;$ dom-> loadXml($ xhtml);foreach($ dom-> getElementsByTagName('a')作为$ node){$ node-> setAttribute('href',"http://site.com/".$ node-> getAttribute('href'));}$ dom-> formatOutput = TRUE;echo $ dom-> saveXML($ dom-> documentElement);
结果:
< div class ="video-image">< a href ="http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/" title ="23">< img src ="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt ="TI" width ="130" height ="78"/></a>< span class ="video-title">< a href ="http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/" title ="sdg">无关紧要什么</a></span>< span class ="video-artist">< a href ="http://site.com/video/TI-否+问题+什么/_CHheEQe9M8/" title ="ss" class =省略号"> TI</a></span></div>
<?php
$cont = '<div class="video-image">
<a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="23">
<img src="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt="TI" width="130" height="78"/>
</a>
<span class="video-title"><a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="sdg">No Matter What</a></span>
<span class="video-artist"><a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="ss" class="ellipsis">TI</a></span>
</div>';
if (preg_match_all('#<a href="([^>]*)"#iU', $cont, $arr))
{
foreach ($arr[1] as $value)
{
var_dump($value);
$cont = preg_replace('#' . preg_quote($value, '#') . '#iU', 'http://site.com/' . $value, $cont);
}
}
echo $cont;
Returned: http://site.com/http://site.com/http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/
Why? I want : http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/
How to do it? sorry for bad english
EDIT
$dom = new DOMDocument;
$dom->loadHTML($cont);
foreach( $dom->getElementsByTagName('a') as $node )
{
$cont = preg_replace('#' . preg_quote($node->getAttribute('href'), '#') . '#', "http://site.com/" . $node->getAttribute('href'), $cont);
}
echo $cont;
This code returns http://site.com/http://site.com/http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/
too...
解决方案
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXml($xhtml);
foreach( $dom->getElementsByTagName('a') as $node )
{
$node->setAttribute(
'href',
"http://site.com/" . $node->getAttribute('href')
);
}
$dom->formatOutput = TRUE;
echo $dom->saveXML($dom->documentElement);
Result:
<div class="video-image">
<a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="23">
<img src="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt="TI" width="130" height="78"/>
</a>
<span class="video-title">
<a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="sdg">No Matter What</a>
</span>
<span class="video-artist">
<a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="ss" class="ellipsis">TI</a>
</span>
</div>
这篇关于从HTML提取链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文