PHP简单HTML DOM解析器>修改获取的链接 [英] PHP Simple HTML DOM Parser > Modify Fetched Links

查看:115
本文介绍了PHP简单HTML DOM解析器>修改获取的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本将从网站获取内容,我想做的是修改所有的链接。假设:

  $ html = str_get_html('< h2 class =r>< a class =lhref =http://www.example.com/2009/07/page.htmlonmousedown =return curwt(this,'http://www.example.com/2009/07/page。 html')> SEO结果Boost< b>< / b>< / a>< / h2>'); 

所以,是否可以以这种方式修改或重写?

 < h2 class =r>< a class =lhref =http://www.site。 com?http://www.example.com/2009/07/page.html> SEO结果Boost< b> < / B>< / A>< / H2> 


我已阅读它的手册,但无法理解如何计算( http://simplehtmldom.sourceforge.net/#fragment-12



是否可能,任何想法?


解决方案

假设相关问题的回复适用,



您应该可以使用以下使用简单HTML DOM

  $ site =http://siteyourgettinglinksfrom.com; 
$ doc = str_get_html($ code);
foreach($ doc-> find('a [href]')as $ a){
$ href = $ a-> href;
if(/ * $ href以绝对URL路径开头* /){
$ a-> href ='http://www.site.com?'.$href;
}
else {/ * $ href以相对路径开头* /
$ a-> href ='http://www.site.com?'.$site.$ HREF;
}

}
$ code =(string)$ doc;



a href =http://php.net/book.dom =nofollow noreferrer> PHP的本机DOM库:

  $ site =http://siteyourgettinglinksfrom.com; 
$ doc = new DOMDocument();
$ doc-> loadHTML($ code);
$ xpath = new DOMXpath($ doc);
foreach($ xpath-> query('// a [@href]')as $ a){
$ href = $ a-> getAttribute('href');
if(/ * $ href以绝对URL路径开头* /){
$ a-> setAttribute('href','http://www.site.com?'.$href );
}
else {/ * $ href以相对路径* /
$ a-> setAttribute('href','http://www.site.com?')开头。 。$ $网站HREF);
}
}
$ code = $ doc-> saveHTML();

检查$ href:



您将检查一个相对链接,并添加您拉取内容的网站的地址,因为大多数网站使用相对链接。 (这是正常表达式匹配器将是您最好的朋友)



对于相对链接,您将前往从

 'http:// www中获取链接的网站的absoute路径.site.com?$ site。$ href 

绝对链接只是附加相对链接

 'http://www.site.com?'.$href 

示例链接



网站相对: /images/picture.jpg



相对文件: ../ images / picture.jpg



绝对: http://somesite.com/images/picture.jpg



注意:这里需要做更多的工作,因为如果处理文档相对链接,那么你将不得不知道什么目录你目前在。网站的相关链接应该是好的,因为g,因为您有获取链接的网站的根文件夹)


i have a script which will fetch content from a website, what i wanna do is modify all that links. Suppose:

$html = str_get_html('<h2 class="r"><a class="l" href="http://www.example.com/2009/07/page.html" onmousedown="return curwt(this, 'http://www.example.com/2009/07/page.html')">SEO Result Boost <b> </b></a></h2>');

so, is it possible to modify or rewrite it in this way>

<h2 class="r"><a class="l" href="http://www.site.com?http://www.example.com/2009/07/page.html">SEO Result Boost <b> </b></a></h2>


I have read it's manual but can not understand how to figure it ( http://simplehtmldom.sourceforge.net/#fragment-12 )

Is It Possible, Any Idea?

解决方案

Assuming the answer to a related question works,

You should be able to use the following working with Simple HTML DOM

$site = "http://siteyourgettinglinksfrom.com";
$doc = str_get_html($code);
foreach ($doc->find('a[href]') as $a) {
$href = $a->href;
if (/* $href begins with a absolute URL path */) {
    $a->href = 'http://www.site.com?'.$href;
}
else{ /* $href begins with a relative path */        
    $a->href = 'http://www.site.com?'.$site.$href;
}

}
$code = (string) $doc;

or

Using PHP’s native DOM library:

$site = "http://siteyourgettinglinksfrom.com";
$doc = new DOMDocument();
$doc->loadHTML($code);
$xpath = new DOMXpath($doc);
foreach ($xpath->query('//a[@href]') as $a) {
$href = $a->getAttribute('href');
if (/* $href begins with a absolute URL path */) {
    $a->setAttribute('href', 'http://www.site.com?'.$href);
}
else{ /* $href begins with a relative path */
    $a->setAttribute('href', 'http://www.site.com?'.$site.$href);
}
}
$code = $doc->saveHTML();

Checking the $href:

you would be checking for a relative link and prepend the address of the site your pulling the content from, since most sites use relative links. (this is where a regular expression matcher would be your best friend)

for relative links you prepend the absoute path to the site which you are getting links from

  'http://www.site.com?'.$site.$href

for absolute links you just append the relative link

  'http://www.site.com?'.$href

Example links:

site relative: /images/picture.jpg

document relative: ../images/picture.jpg

absolute: http://somesite.com/images/picture.jpg

(Note: there is a little more work that needs done here, because if your handling "document relative" links, then you will have to know what directory you're currently in. Site relative links should be good to go, as long as you have the root folder of the site you're getting links from)

这篇关于PHP简单HTML DOM解析器&gt;修改获取的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆