Perl Mechanize:修改页面后获取响应页面吗? [英] Perl Mechanize : Get the response page after the page is modified?
问题描述
我正在尝试检索一个使用js和数据库加载的页面.加载大约需要2到3分钟.我可以在页面上显示请等待2到3分钟才能加载页面". 但是页面加载后无法检索.
I am trying to retrieve a page which uses js and database to load. The loading takes about 2 to 3 mins. I am able to get the page where it would show "Please wait 2 to 3 mins for the page to be loaded." But not able to retrieve the page after it is loaded.
我已经尝试了以下方法:
I have already tried the following:
1.)使用镜像方法在机械化中.但是响应内容未解码.因此,该文件是乱码. (还尝试编写与镜像方法类似的方法,该方法将解码响应内容,但也无效.未加载新内容.)
1.) Using mirror method in the Mechanize. But the response content is not decoded. Hence the file is gibberish. (Also tried to write a similar method as mirror method which would decode the response content but that also doesnt work. The New content is not loaded.)
2.)尝试添加请求标头'if-modified-since'.但是时间仍然是相同的,并且新内容不会被获取.
2.) Tried to add a request header 'if-modified-since'. But still the time is same and the new content is not fetched.
任何指针或建议都会很有帮助.
Any pointers or suggestions would really be helpful.
TIA :)
推荐答案
它不能与Mechanize本身一起使用,您需要先检查一下 javascript正在处理页面以及数据来自何处 从.然后,有两种可能性:
It wont work with Mechanize itself, you need to check first what javascript is doing to the page, and from where the data are coming from. Then, 2 possibilities :
- 在加载之前获取数据以及javascript从何处下载新数据后,您可以在perl中模仿javascript.查看数据是否经过某种编码,然后使用perl对其进行解码.
- 您使用的是Mech Firefox,则无需关心javascript,因为它将由Firefox处理.如果您不想看到该应用程序,可以将其隐藏.
示例:
use WWW::Mechanize::Firefox;
use HTML::TreeBuilder::LibXML;
my $mech = WWW::Mechanize::Firefox->new;
$mech->get('http://example.com/ajax.html');
my $tree = HTML::TreeBuilder::LibXML->new;
$tree->parse($mech->content);
$tree->eof;
my $something = $tree->findvalue('/html/body/div[10]/table');
上面的代码没有经过测试,但是应该可以工作.
Above code is not tested, but should work.
享受.
这篇关于Perl Mechanize:修改页面后获取响应页面吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!