如何从anchor href获取完全限定的URL? [英] How to get fully-qualified URL from anchor href?
问题描述
我正在用php编写一个web爬虫。给定一个当前的URL以及绝对,相对和根URL的链接数组,我将如何确定每个链接的完全限定URL?
I am writing a web crawler in php. Given a current URL, and an array of links to absolute, relative, and root URLs, how would I determine the fully-qualified URL for each link?
例如,我可以说我在抓取网址:
For example, I let's say I am crawling the URL:
http://www.example.com/path/to/my/file.html
网页包含的链接数组为:
And the array of links that the webpage contains is:
array(
'http://www.some-other-domain.com/',
'../../',
'/search',
);
我如何确定每个链接的完全限定URL?我在这个例子中寻找的结果分别是:
How would I determine the fully-qualified URL for each of those links? The result I am looking for in this example would be, respectively:
http://www.some-other-domain.com/
http://www.example.com/path/
http://www.example.com/search/
推荐答案
我认为最简单的方法是使用像这样的库:
http://www.electrictoolbox.com/php-resolve-relative-urls-absolute/
I think the easiest way is to use a library like this: http://www.electrictoolbox.com/php-resolve-relative-urls-absolute/
链接示例:
Examples from the link:
url_to_absolute('http://www.example.com/sitemap.html', 'aboutus.html');
解析为 http://www.example.com/aboutus.html
或
url_to_absolute('http://www.example.com/content/sitemap.html', '../images/somephoto.jpg');
解析为 http://www.example.com/images/somephoto .jpg
这篇关于如何从anchor href获取完全限定的URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!