机械化+ Python:如何在简单的javascript中跟随链接? [英] Mechanize + Python: how to follow a link in a simple javascript?

查看:83
本文介绍了机械化+ Python:如何在简单的javascript中跟随链接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

short:如何使用python Mechanize执行/模拟javascript重定向?

short: How to execute/simulate javascript redirection with python Mechanize?

location.href="http://www.site2.com/";

我已经制作了一个带有mechanize模块的python脚本,该脚本在页面中寻找链接并跟随它.

I've made a python script with mechanize module that looks for a link in a page and follows it.

问题出在我这样做的特定站点上

The problem is on a particular site that when I do

br.follow_link("http://www.address1.com") 

他将我重定向到此简单页面:

he redirects me to this simple page:

<script language="JavaScript">{                                                                                         
    location.href="http://www.site2.com/";                                                                                           
    self.focus();                                                                                                                   
    }</script>

现在,如果我这样做了:

Now, if I do:

br = mechanize.Browser(factory=mechanize.RobustFactory())

... #other code

br.follow_link("http://www.address1.com") 
for link in br.links():   
br.follow_link(link)
print link

它不打印任何内容,这意味着该页面中没有链接. 但是,如果我手动分析页面并执行:

it doesn't prints anything, that means that there is no link in that page. But if I manually parse the page and I execute:

br.open("http://www.site2.com")

Site2无法识别我来自"www.address1.com",并且该脚本无法正常运行!

Site2 doesn't recognizes that I'm coming from "www.address1.com" and the script does not work as I would like!

很抱歉,这只是一个新手问题,在此先谢谢您!

Sorry if it's just a newbie question and thank you in advance!

p.s.我有br.set_handle_referer(True)

p.s. I have br.set_handle_referer(True)

更多信息: 检查与Fiddler2的链接看起来像:

more info: Inspecting that link with Fiddler2 it looks like:

获取 http://www.site2.com/ HTTP/1.1主机:www.site2.com联系: keep-alive用户代理:Mozilla/5.0(Windows NT 6.2; WOW64) AppleWebKit/537.4(KHTML,例如Gecko)Chrome/22.0.1229.94 Safari/537.4接受: text/html,application/xhtml + xml,application/xml; q = 0.9,/; q = 0.8 引荐来源: http://www.address1.com 接受编码:gzip,deflate,sdch 接受语言:it-IT,它; q = 0.8,en-US; q = 0.6,en; q = 0.4
接受字符集:ISO-8859-1,utf-8; q = 0.7,*; q = 0.3 Cookie: PHPSESSID = 6e161axxxxxxxxxxx; user = myusername;
通过= xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; ip = 79.xx.xx.xx;
agent = a220243a8b8f83de64c6204a5ef7b6eb; __utma = 154746788.943755841.1348303404.1350232016.1350241320.43; __utmb = 154746788.12.10.1350241320; __utmc = 154999999; __utmz = 154746788.134999998.99.6.utmcsr = google | utmccn =(organic)| utmcmd = organic | utmctr =%something%something%

GET http://www.site2.com/ HTTP/1.1 Host: www.site2.com Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Referer: http://www.address1.com Accept-Encoding: gzip,deflate,sdch Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: PHPSESSID=6e161axxxxxxxxxxx; user=myusername;
pass=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; ip=79.xx.xx.xx;
agent=a220243a8b8f83de64c6204a5ef7b6eb; __utma=154746788.943755841.1348303404.1350232016.1350241320.43; __utmb=154746788.12.10.1350241320; __utmc=154999999; __utmz=154746788.134999998.99.6.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=%something%something%

所以这似乎是一个cookie问题?

so it seems to be a cookie problem?

推荐答案

我解决了!通过这种方式:

I solved it! in this way:

    cj = cookielib.LWPCookieJar()
    br.set_cookiejar(cj)

    ...

    br.follow_link("www.address1.com")
    refe= br.geturl()
    req = urllib2.Request(url='www.site2.com')
    req.add_header('Referer', refe)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj) )
    f = opener.open(req) 
    htm = f.read()
    print "\n\n", htm

这篇关于机械化+ Python:如何在简单的javascript中跟随链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆