为什么LWP :: UserAgent无法完全获得此站点? [英] Why can't LWP::UserAgent get this site entirely?

查看:141
本文介绍了为什么LWP :: UserAgent无法完全获得此站点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从一开始只输出几行.

#!/usr/bin/perl

use strict;
use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
my $response = $ua->get('http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed');
print $response->decoded_content;

推荐答案

我进行了以下修改:

my $response = $ua->get( 'http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed' );
say $response->headers->as_string;

看到了:

Cache-Control: max-age=60s
Connection: close
Date: Wed, 06 Feb 2013 23:51:15 GMT
Via: 1.1 varnish
Age: 0
Server: Apache
Vary: Accept-Encoding
Content-Length: 50519
Content-Type: text/html; charset=ISO-8859-1
Client-Aborted: die
Client-Date: Wed, 06 Feb 2013 23:50:50 GMT
Client-Peer: 94.198.83.18:80
Client-Response-Num: 1
X-Died: Illegal field name 'X-Meta-Twitter:card' at .../HTML/HeadParser.pm line 207.
X-Varnish: 630361704

它似乎不喜欢第27行上的<meta name="twitter:card" content="summary" />标签.它说它死了.

It doesn't seem to like the <meta name="twitter:card" content="summary" /> tag on line 27. It says that it died.

似乎可以将具有name属性的任何meta标记转换为"X-Meta-\u$attr->{name}""header".然后,它尝试将content属性的值存储为X元标头"值.这样(从194行开始):

It seems to translate any meta tag with a name attribute to a "X-Meta-\u$attr->{name}" "header". It then tries to store the value of the content attribute as the X-meta "header" value. Like this (starting at line 194):

if ($tag eq 'meta') {
    my $key = $attr->{'http-equiv'};
    if (!defined($key) || !length($key)) {
        if ($attr->{name}) {
            $key = "X-Meta-\u$attr->{name}"; # <-- Here's the little trick
        } elsif ($attr->{charset}) { # HTML 5 <meta charset="...">
            $key = "X-Meta-Charset";
            $self->{header}->push_header($key => $attr->{charset});
            return;
        } else {
            return;
        }
    }
    $self->{'header'}->push_header($key => $attr->{content});
}

我将此模块的修改后的副本推送到PERL5LIB目录中.我将push_header步骤包装在eval块中,并完全下载了该页面.

I pushed a modified copy of this module into a PERL5LIB directory. I wrapped the push_header step in an eval block and downloaded the page completely.

这篇关于为什么LWP :: UserAgent无法完全获得此站点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆