如何从anchor href获取完全限定的URL? [英] How to get fully-qualified URL from anchor href?

查看:351
本文介绍了如何从anchor href获取完全限定的URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用php编写一个web爬虫。给定一个当前的URL以及绝对,相对和根URL的链接数组,我将如何确定每个链接的完全限定URL?

I am writing a web crawler in php. Given a current URL, and an array of links to absolute, relative, and root URLs, how would I determine the fully-qualified URL for each link?

例如,我可以说我在抓取网址:

For example, I let's say I am crawling the URL:

http://www.example.com/path/to/my/file.html

网页包含的链接数组为:

And the array of links that the webpage contains is:

array(
    'http://www.some-other-domain.com/',
    '../../',
    '/search',
);

我如何确定每个链接的完全限定URL?我在这个例子中寻找的结果分别是:

How would I determine the fully-qualified URL for each of those links? The result I am looking for in this example would be, respectively:

http://www.some-other-domain.com/
http://www.example.com/path/
http://www.example.com/search/


推荐答案

我认为最简单的方法是使用像这样的库:
http://www.electrictoolbox.com/php-resolve-relative-urls-absolute/

I think the easiest way is to use a library like this: http://www.electrictoolbox.com/php-resolve-relative-urls-absolute/

链接示例:

Examples from the link:

url_to_absolute('http://www.example.com/sitemap.html', 'aboutus.html');

解析为 http://www.example.com/aboutus.html

url_to_absolute('http://www.example.com/content/sitemap.html', '../images/somephoto.jpg');

解析为 http://www.example.com/images/somephoto .jpg

这篇关于如何从anchor href获取完全限定的URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆