蟒蛇 - 手动浏览器登录后,从恢复的urllib2 Web会话 [英] Python - resume web session from urllib2 after manual browser login

查看:337
本文介绍了蟒蛇 - 手动浏览器登录后,从恢复的urllib2 Web会话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我说,我浏览到一个网站(Intranet的太)需要登录才能访问的内容。我将填写必填字段...例如用户名,密码和任何验证码等所需要的从浏览器本身登录

Say, I browse to a website (on intranet too) that require a login to access the contents. I will fill in the required fields... e.g. username, password and any captcha, etc. that is required for logging in from the browser itself.

在我已经登录到该网站,有很多好吃的东西,可以从几个链接和标签登录后的第一页上刮下。

Once I have logged in into the site, there are lots of goodies that can be scraped from several links and tabs on the first page after logged in.

现在,从这一点向前(即从浏览器登录后)。我要控制页面和下载的urllib2 ...想通过页每一页,等通过网页,下载PDF文件和图像会

Now, from this point forward (that is after logged in from the browser).. I want to control the page and downloads from urllib2... like going through page by page, download pdf and images on each page, etc.

据我所知,我们可以使用一切从urllib2的(或机械化),直接(即登录页面,做这件事)。

I understand that we can use everything from urllib2 (or mechanize) directly (that is login to the page and do the whole thing).

但是,对于一些网站..它确实是一个痛苦的经历,并找出登录机制,需要隐藏参数,引荐的验证码,饼干和弹出窗口。

But, for some sites.. it is really a pain to go through and find out the login mechanism, required hidden parameters, referrers, captcha, cookies and pop ups.

请指教。希望我的问题是有道理的。

Please advise. Hope my question makes sense.

综上所述,我想手动使用网络浏览器所做的初始登录部分...然后接管自动化,通过刮的urllib2

推荐答案

你考虑?这是关于浏览器的自动化,而不是HTTP请求(urllib2的),你可以在步骤之间操作的浏览器。

Did you consider Selenium? It's about browser automation instead of http requests (urllib2), and you can manipulate the browser in between steps.

这篇关于蟒蛇 - 手动浏览器登录后,从恢复的urllib2 Web会话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆