如何在不使用 Perl 和 LWP 获取页面的情况下获取最终 URL? [英] How can I get the ultimate URL without fetching the pages using Perl and LWP?

查看:52
本文介绍了如何在不使用 Perl 和 LWP 获取页面的情况下获取最终 URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Perl 的 LWP 进行一些网络抓取.我需要处理一组 URL,其中一些可能会重定向(1 次或多次).

I'm doing some web scraping using Perl's LWP. I need to process a set of URLs, some of which may redirect (1 or more times).

如何使用 HEAD 方法获取解析所有重定向的最终 URL?

How can I get ultimate URL with all redirects resolved, using HEAD method?

推荐答案

如果您使用 LWP::UserAgent,则返回的响应是HTTP::Response 又作为一个属性 HTTP::Request.请注意,这不一定与您使用一组 URL 中的原始 URL 创建的 HTTP::Request 相同,如 HTTP::Response 文档中所述,用于检索其中的请求实例的方法响应实例:

If you use the fully featured version of LWP::UserAgent, then the response that is returned is an instance of HTTP::Response which in turn has as an attribute an HTTP::Request. Note that this is NOT necessarily the same HTTP::Request that you created with the original URL in your set of URLs, as described in the HTTP::Response documentation for the method to retrieve the request instance within the response instance:

$r->request($request)

$r->request( $request )

这用于获取/设置请求属性.request 属性是对引起此响应的请求的引用.它不必与传递给 $ua->request() 方法的请求相同,因为在两者之间可能存在重定向和授权重试.

This is used to get/set the request attribute. The request attribute is a reference to the the request that caused this response. It does not have to be the same request passed to the $ua->request() method, because there might have been redirects and authorization retries in between.

获得请求对象后,就可以使用uri方法获取URI.如果使用了重定向,则 URI 是遵循重定向链的结果.

Once you have the request object, you can use the uri method to get the URI. If redirects were used, the URI is the result of following the chain of redirects.

这是一个经过测试和验证的 Perl 脚本,它为您提供了所需的框架:

Here's a Perl script, tested and verified, that gives you the skeleton of what you need:

#!/usr/bin/perl

use strict;
use warnings;

use LWP::UserAgent;

my $ua;  # Instance of LWP::UserAgent
my $req; # Instance of (original) request
my $res; # Instance of HTTP::Response returned via request method

$ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);

$req = HTTP::Request->new(HEAD => 'http://www.ecu.edu/wllc');
$req->header('Accept' => 'text/html');

$res = $ua->request($req);

if ($res->is_success) {
    # Using double method invocation, prob. want to do testing of
    # whether res is defined.
    # This is inline version of
    # my $finalrequest = $res->request(); 
    # print "Final URL = " . $finalrequest->url() . "\n";
    print "Final URI = " . $res->request()->uri() . "\n";
} else {
    print "Error: " . $res->status_line . "\n";
}

这篇关于如何在不使用 Perl 和 LWP 获取页面的情况下获取最终 URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆