使用 urllib2 - Python 2.7 登录网站 [英] Login to website using urllib2 - Python 2.7

查看:24
本文介绍了使用 urllib2 - Python 2.7 登录网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我将它用于 reddit 机器人,但我希望能够弄清楚如何登录任何网站.如果这有意义....

我意识到不同的网站使用不同的登录表单等.那么我如何弄清楚如何为每个网站优化它?我假设我需要在 html 文件中寻找一些东西,但不知道是什么.

我不想使用 Mechanize 或任何其他库(这是这里所有其他答案的内容,实际上并没有帮助我了解正在发生的事情),因为我想自己了解它究竟是如何一切顺利.

urllib2 文档确实对我没有帮助.

谢谢.

解决方案

我先说我有一段时间没有以这种方式登录,所以我可能会错过一些更被接受"的内容方法.

我不确定这是否是您所追求的,但是如果没有像 mechanize 这样的库或像 selenium 这样更强大的框架,在基本情况下,您只需查看表单本身并找出inputs.比如查看www.reddit.com,再查看渲染页面的源码,你会发现这个表单:

<form method="post" action="https://ssl.reddit.com/post/login" id="login_login-main"class="login-form login-form-side"><input type="hidden" name="op" value="login-main"/><input name="user" placeholder="username" type="text" maxlength="20" tabindex="1"/><input name="passwd" placeholder="password" type="password" tabindex="1"/><div class="status"></div><div id="记住我"><input type="checkbox" name="rem" id="rem-login-main" tabindex="1"/><label for="rem-login-main">记住我</label><a class="recover-password" href="/password">重置密码</a>

<div class="提交"><button class="btn" type="submit" tabindex="1">登录</button>

<div class="clear"></div></表单>

这里我们看到一些input的 - op, user, passwdrem.另外,请注意 action 参数 - 这是表单将发布到的 URL,因此将成为我们的目标.所以现在最后一步是将参数打包到负载中,并将其作为 POST 请求发送到 action URL.同样在下面,我们创建了一个新的 opener,添加了处理 cookie 的能力并添加了标头,为我们提供了一个更强大的 opener 来执行请求):

导入cookielib导入 urllib导入 urllib2# 存储 cookie 并创建一个可以容纳它们的开瓶器cj = cookielib.CookieJar()开瓶器 = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))# 添加我们的标题opener.addheaders = [('用户代理', 'RedditTesting')]# 安装我们的开启器(注意这将全局开启器更改为# 我们刚刚制作,但你也可以根据需要调用 opener.open())urllib2.install_opener(开瓶器)# 表单中的动作/目标authentication_url = 'https://ssl.reddit.com/post/login'# 我们要发送的输入参数有效载荷 = {'op': '登录主','用户': '<用户名>','passwd': '<密码>'}# 使用urllib对payload进行编码数据 = urllib.urlencode(payload)# 构建我们的请求对象(提供数据"使它成为一个 POST)req = urllib2.Request(authentication_url, 数据)# 发出请求并读取响应resp = urllib2.urlopen(req)内容 = resp.read()

请注意,这可能会变得更加复杂 - 例如,您也可以使用 GMail 执行此操作,但是您需要引入每次都会更改的参数(例如 GALX 参数).同样,不确定这是否是您想要的,但希望它有所帮助.

Okay, so I am using this for a reddit bot, but I want to be able to figure out HOW to log in to any website. If that makes sense....

I realise that different websites use different login forms etc. So how do I figure out how to optimise it for each website? I'm assuming I need to look for something in the html file but no idea what.

I do NOT want to use Mechanize or any other library (which is what all the other answers are about on here and don't actually help me to learn what is happening), as I want to learn by myself how exactly it all works.

The urllib2 documentation really isn't helping me.

Thanks.

解决方案

I'll preface this by saying I haven't done logging in in this way for a while, so I could be missing some of the more 'accepted' ways to do it.

I'm not sure if this is what you're after, but without a library like mechanize or a more robust framework like selenium, in the basic case you just look at the form itself and seek out the inputs. For instance, looking at www.reddit.com, and then viewing the source of the rendered page, you will find this form:

<form method="post" action="https://ssl.reddit.com/post/login" id="login_login-main"
  class="login-form login-form-side">
    <input type="hidden" name="op" value="login-main" />
    <input name="user" placeholder="username" type="text" maxlength="20" tabindex="1" />
    <input name="passwd" placeholder="password" type="password" tabindex="1" />

    <div class="status"></div>

    <div id="remember-me">
      <input type="checkbox" name="rem" id="rem-login-main" tabindex="1" />
      <label for="rem-login-main">remember me</label>
      <a class="recover-password" href="/password">reset password</a>
    </div>

    <div class="submit">
      <button class="btn" type="submit" tabindex="1">login</button>
    </div>

    <div class="clear"></div>
</form>

Here we see a few input's - op, user, passwd and rem. Also, notice the action parameter - that is the URL to which the form will be posted, and will therefore be our target. So now the last step is packing the parameters into a payload and sending it as a POST request to the action URL. Also below, we create a new opener, add the ability to handle cookies and add headers as well, giving us a slightly more robust opener to execute the requests):

import cookielib
import urllib
import urllib2


# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]

# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)

# The action/ target from the form
authentication_url = 'https://ssl.reddit.com/post/login'

# Input parameters we are going to send
payload = {
  'op': 'login-main',
  'user': '<username>',
  'passwd': '<password>'
  }

# Use urllib to encode the payload
data = urllib.urlencode(payload)

# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)

# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()

Note that this can get much more complicated - you can also do this with GMail, for instance, but you need to pull in parameters that will change every time (such as the GALX parameter). Again, not sure if this is what you wanted, but hope it helps.

这篇关于使用 urllib2 - Python 2.7 登录网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆