使用简单HTML DOM将相对URL转换为绝对URL? [英] Convert a relative URL to an absolute URL with Simple HTML DOM?

查看:73
本文介绍了使用简单HTML DOM将相对URL转换为绝对URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从某些页面抓取内容时,脚本会提供一个相对URL.是否可以通过Simple HTML DOM获得绝对URL?

解决方案

我认为简单的HTML DOM解析器可以做到.

但是您可以自己做.首先,如果没有另外声明,则需要区分基本URI,即文档的URI(请参见 BASE元素).比获取每个URI参考,并应用算法来解析相对URI(如RFC 3986中所述)(已经有一些类可用于该类,例如 PEAR包Net_URL2 ). >

因此,使用这两个类,您可以执行以下操作:

$uri = new Net_URL2('http://example.com/foo/bar'); // URI of the resource
$baseURI = $uri;
foreach ($html->find('base[href]') as $elem) {
    $baseURI = $uri->resolve($elem->href);
}

foreach ($html->find('*[src]') as $elem) {
    $elem->src = $baseURI->resolve($elem->src)->__toString();
}
foreach ($html->find('*[href]') as $elem) {
    if (strtoupper($elem->tag) === 'BASE') continue;
    $elem->href = $baseURI->resolve($elem->href)->__toString();
}
foreach ($html->find('form[action]') as $elem) {
    $elem->action = $baseURI->resolve($elem->action)->__toString();
}

重复替换包含URI的任何其他属性,例如backgroundciteclassidcodebasedatalongdescprofileusemap(请参见 HTML 4.01中的属性索引).

When I'm scraping content from some pages, the script gives a relative URL. Is it possible to get a absolute URL with Simple HTML DOM?

解决方案

I don’t think that the Simple HTML DOM Parser can do that.

But you can do that on your own. First you need to distinguish the base URI that is the URI of the document if not declared otherwise (see BASE element). Than get each URI reference and apply the algorithms to resolve a relative URI as described in RFC 3986 (there already are classes you can use for that like the PEAR package Net_URL2).

So, using these two classes, you could do something like this:

$uri = new Net_URL2('http://example.com/foo/bar'); // URI of the resource
$baseURI = $uri;
foreach ($html->find('base[href]') as $elem) {
    $baseURI = $uri->resolve($elem->href);
}

foreach ($html->find('*[src]') as $elem) {
    $elem->src = $baseURI->resolve($elem->src)->__toString();
}
foreach ($html->find('*[href]') as $elem) {
    if (strtoupper($elem->tag) === 'BASE') continue;
    $elem->href = $baseURI->resolve($elem->href)->__toString();
}
foreach ($html->find('form[action]') as $elem) {
    $elem->action = $baseURI->resolve($elem->action)->__toString();
}

Repeat the substitution for any other attribute containing a URI like background, cite, classid, codebase, data, longdesc, profile and usemap (see index of attributes in HTML 4.01).

这篇关于使用简单HTML DOM将相对URL转换为绝对URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆