使用urllib2登录网站-Python 2.7 [英] Login to website using urllib2 - Python 2.7

查看:99
本文介绍了使用urllib2登录网站-Python 2.7的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我将它用于reddit机器人,但我希望能够弄清楚如何登录到任何网站. 如果那有道理......

Okay, so I am using this for a reddit bot, but I want to be able to figure out HOW to log in to any website. If that makes sense....

我意识到不同的网站使用不同的登录表单等.那么我如何找出如何针对每个网站进行优化呢?我假设我需要在html文件中查找某些内容,但不知道是什么.

I realise that different websites use different login forms etc. So how do I figure out how to optimise it for each website? I'm assuming I need to look for something in the html file but no idea what.

我不想使用Mechanize或任何其他库(这是这里所有其他答案的内容,并且实际上并不能帮助我了解正在发生的事情),因为我想自己学习一下它的精确程度所有作品.

I do NOT want to use Mechanize or any other library (which is what all the other answers are about on here and don't actually help me to learn what is happening), as I want to learn by myself how exactly it all works.

urllib2文档确实对我没有帮助.

The urllib2 documentation really isn't helping me.

谢谢.

推荐答案

在此之前,我会先说我一段时间没有以这种方式登录,所以我可能会错过一些更被接受"的内容的方法.

I'll preface this by saying I haven't done logging in in this way for a while, so I could be missing some of the more 'accepted' ways to do it.

我不确定这是您要追求的,但是没有像mechanize这样的库或者没有像selenium这样更强大的框架,在基本情况下,您只需查看表单本身并找出inputs.例如,查看www.reddit.com,然后查看呈现的页面的源,您将找到以下形式:

I'm not sure if this is what you're after, but without a library like mechanize or a more robust framework like selenium, in the basic case you just look at the form itself and seek out the inputs. For instance, looking at www.reddit.com, and then viewing the source of the rendered page, you will find this form:

<form method="post" action="https://ssl.reddit.com/post/login" id="login_login-main"
  class="login-form login-form-side">
    <input type="hidden" name="op" value="login-main" />
    <input name="user" placeholder="username" type="text" maxlength="20" tabindex="1" />
    <input name="passwd" placeholder="password" type="password" tabindex="1" />

    <div class="status"></div>

    <div id="remember-me">
      <input type="checkbox" name="rem" id="rem-login-main" tabindex="1" />
      <label for="rem-login-main">remember me</label>
      <a class="recover-password" href="/password">reset password</a>
    </div>

    <div class="submit">
      <button class="btn" type="submit" tabindex="1">login</button>
    </div>

    <div class="clear"></div>
</form>

在这里我们看到一些input的-opuserpasswdrem.另外,请注意action参数-这是表单将发布到的URL,因此将成为我们的目标.因此,现在的最后一步是将参数打包到有效负载中,并将其作为POST请求发送到action URL.同样在下面,我们创建了一个新的opener,还添加了处理cookie和添加标头的功能,从而为我们提供了一个更强大的打开器来执行请求):

Here we see a few input's - op, user, passwd and rem. Also, notice the action parameter - that is the URL to which the form will be posted, and will therefore be our target. So now the last step is packing the parameters into a payload and sending it as a POST request to the action URL. Also below, we create a new opener, add the ability to handle cookies and add headers as well, giving us a slightly more robust opener to execute the requests):

import cookielib
import urllib
import urllib2


# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]

# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)

# The action/ target from the form
authentication_url = 'https://ssl.reddit.com/post/login'

# Input parameters we are going to send
payload = {
  'op': 'login-main',
  'user': '<username>',
  'passwd': '<password>'
  }

# Use urllib to encode the payload
data = urllib.urlencode(payload)

# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)

# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()

请注意,这可能变得更加复杂-例如,您也可以使用GMail进行此操作,但是您需要提取每次都会更改的参数(例如GALX参数).同样,不确定这是否是您想要的,但希望对您有所帮助.

Note that this can get much more complicated - you can also do this with GMail, for instance, but you need to pull in parameters that will change every time (such as the GALX parameter). Again, not sure if this is what you wanted, but hope it helps.

这篇关于使用urllib2登录网站-Python 2.7的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆