登录到网站上与蟒蛇 [英] Log in to website with python

查看:127
本文介绍了登录到网站上与蟒蛇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用python脚本登录到维基百科,但尽管按照指示<一个href=\"http://stackoverflow.com/questions/189555/how-to-use-python-to-login-to-a-webpage-and-retrieve-cookies-for-later-usage\">here,我只是不能得到它的工作。

I'm trying to log in to Wikipedia using a python script, but despite following the instructions here, I just can't get it to work.

import urllib
import urllib2
import cookielib

username = 'myname'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6")]
login_data = urllib.urlencode({'wpName' : username, 'wpPassword' : password})
opener.open('http://en.wikipedia.org/w/index.php?title=Special:UserLogin', login_data)
resp = opener.open('http://en.wikipedia.org/wiki/Special:Watchlist')

我得到的是网页你没有登录。我试图登录到与相同阴性结果脚本另一个站点。我怀疑这是不是得到的东西做的饼干,或者我失去了一些东西非常简单的在这里。但我就是无法找到它。

All I get is the "You're not logged in" page. I tried logging in to another site with the script with the same negative result. I suspect it's either got something to do with cookies, or I'm missing something incredibly simple here. But I just cannot find it.

推荐答案

如果您检查发送到登录URL的原始请求(用工具的帮助,如的查尔斯代理),你会看到,它实际上是发送4个参数: wpName wpPassword wpLoginAttempt wpLoginToken 。第3是静态的,您可以随时在填补他们,第四届一个上。然而需要从登录页面的HTML解析。您将需要发布此值,您解析,除了其他3,登陆网址就能登录。

If you inspect the raw request sent to the login URL (with the help of a tool such as Charles Proxy), you will see that it is actually sending 4 parameters: wpName, wpPassword, wpLoginAttempt and wpLoginToken. The first 3 are static and you can fill them in anytime, the 4th one however needs to be parsed from the HTML of the login page. You will need to post this value you parsed, in addition to the other 3, to the login URL to be able to login.

下面是一个使用 请工作code BeautifulSoup

Here is the working code using Requests and BeautifulSoup:

import requests
from bs4 import BeautifulSoup as bs


def get_login_token(raw_resp):
    soup = bs(raw_resp.text, 'lxml')
    token = [n.get('value', '') for n in soup.find_all('input')
             if n.get('name', '') == 'wpLoginToken']
    return token[0]

payload = {
    'wpName': 'my_username',
    'wpPassword': 'my_password',
    'wpLoginAttempt': 'Log in',
    #'wpLoginToken': '',
    }

with requests.session() as s:
    resp = s.get('http://en.wikipedia.org/w/index.php?title=Special:UserLogin')
    payload['wpLoginToken'] = get_login_token(resp)

    response_post = s.post('http://en.wikipedia.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=login',
                           data=payload)
    response = s.get('http://en.wikipedia.org/wiki/Special:Watchlist')

这篇关于登录到网站上与蟒蛇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆