使用请求登录到具有 javascript 登录表单的网站 [英] using requests to login to a website that has javascript login form

查看:77
本文介绍了使用请求登录到具有 javascript 登录表单的网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我先说我的编程经验很少.在过去的几天里,我在尝试编写这个程序时学到了很多东西.我正在使用 PyCharm、请求、Beautiful Soup 和 lxml 在 Windows 7 上运行 Python 2.7.

我试图从一个严重依赖 Javascript 的网站上抓取数据.我有两个选择:

1) 我需要的数据是通过 Javascript 填充的,不一定需要登录.但是,我一直无法弄清楚如何获取这些数据.我已经使用实时 HTTP 标头 chrome 插件实时监控标头,我想我已经找到了可以做到这一点的 Javascript,但我无法弄清楚.它的代码有点长,如果有人有兴趣看一下,我会发布它.

2) 在其中一个主页上,我发现了一系列 ID 号,我可以使用这些 ID 号为我正在分析的每个项目生成 URL.问题是我必须登录才能看到这些单独的项目页面.我的代码如下:

 from requests.adapters import HTTPAdapter从 requests.packages.urllib3.poolmanager 导入 PoolManager从 BeautifulSoup 导入 BeautifulSoup导入 ssl# 向用户请求日期UDate = "06/22/2015" # raw_input('请输入日期 mm/dd/yyyy\n')# 打开 TLSv1 适配器(这意味着什么)类 MyAdapter(HTTPAdapter):def init_poolmanager(self, connections, maxsize, block=False):self.poolmanager = PoolManager(num_pools=connections,最大尺寸=最大尺寸,块=块,ssl_version=ssl.PROTOCOL_TLSv1)# 开始一个请求会话.从这里开始的每次获取都将使用 TLSv1 协议进口请求有效载荷 = {'日志名': 'xxxxxxxx','LogPass': 'xxxxxxxx'}s = requests.Session()s.mount('https://xxxx.xxx', MyAdapter())# 使用帖子登录并从主页请求源代码.log = s.post('LoginURL', data=payload)打印日志文本结果 = s.get(url)汤 = BeautifulSoup(result.content)打印汤

帖子或获取都没有向我显示已登录的网站.HTML 源代码中的 logform id 如下所示:

<label for="BadText"><div id="BadText" class="BadText" style="display:none" tabindex="-2">用户名或密码无效</div></标签><div class="LogLabel"><label for="LogName" >用户名</label><input tabindex="0" id="LogName" class="LogInput" value=""/>

<div class="LogLabel"><label for="LogPass" >用户密码&nbsp;&nbsp;</label><input tabindex="0"id="LogPass" type="password" class="LogInput" value=""/>

所以我在帖子中传递了 LogName 和 LogPass.

还有一个带有这段代码的 logform.js

$("#LogButton").click(function(){//$('#divLogForm').hide();//$('#divLoading').show();var uName = $("#LogName").val();var uPass = $("#LogPass").val();var url = "/index.cfm";$.post(url, {ZACTION:'AJAX',ZMETHOD:'LOGIN',func:'LOGIN',USERNAME:uName, USERPASS:uPass},函数(数据){if(data.isOk ==YES"){location.href="/index.cfm";}else {$('.BadText').show();$('#BadText').focus();};},"json");});

我代码中的 LoginURL 取自此脚本中的 var url.我曾尝试使用 USERNAME &USERPASS 和我已经在我的帖子中尝试了 uName 和 uPass,但这些也不起作用.

不知道如何向前推进.非常感谢任何帮助

解决方案

您发布的最后一点 javascript 提供了有关为什么您的登录 POST 请求不起作用的线索.

根据 javascript,您应该在登录 POST 中发送如下所示的字典:

<代码>{'ZACTION': 'AJAX','ZMETHOD': '登录','func': '登录','USERNAME': '<输入用户名>','USERPASS': '<输入密码>'},

Let me preface by saying I have very little programming experience. I've learned a bunch in the last few days trying to write this program. I am running Python 2.7 on Windows 7 using PyCharm, requests, Beautiful Soup, and lxml.

I am trying to scrape data from a website that relies heavily on Javascript. I have two options:

1) The data I need is populated through Javascript and does not necessarily need a login. However I have not been able to figure how to get at this data. I've live monitored headers with live HTTP Headers chrome plugin and I think I've found the Javascript that does it but I'ts beyond my means to figure it out. Its a long bit of code, I'll post it if anyone is interested in taking a look.

or

2)On one of the main pages I found a series of ID numbers which I can use to generate URL's for each of the individual items I am analyzing. Problem is I have to be logged in to see these individual item pages. My code is as follows:

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
from BeautifulSoup import BeautifulSoup
import ssl

# Request a date from user
UDate = "06/22/2015"  # raw_input('Enter a date mm/dd/yyyy\n')

# Open TLSv1 Adapter (Whataver that means)
class MyAdapter(HTTPAdapter):
    def init_poolmanager(self, connections, maxsize, block=False):
        self.poolmanager = PoolManager(num_pools=connections,
                                       maxsize=maxsize,
                                       block=block,
                                       ssl_version=ssl.PROTOCOL_TLSv1)

# Begin a requests session. Every get from here on out will use TLSv1 Protocol
import requests

payload = {
    'LogName': 'xxxxxxxx',
    'LogPass': 'xxxxxxxx'
}

s = requests.Session()
s.mount('https://xxxx.xxx', MyAdapter())

# Login with post and Request source code from main page.
log = s.post('LoginURL', data=payload)
print log.text

result = s.get(url)
soup = BeautifulSoup(result.content)
print soup

Neither the post or the get show me a logged in website. The logform id's from the HTML source code look like this:

<div id="DivLogForm">
        <label for="BadText"><div id="BadText" class="BadText" style="display:none" tabindex="-2">User Name or Password is Invalid</div></label>

        <div class="LogLabel">
            <label for="LogName" > User Name&nbsp;&nbsp;</label><input tabindex="0" id="LogName" class="LogInput" value="" />
        </div>
        <div  class="LogLabel">
            <label for="LogPass" >User Password&nbsp;&nbsp;</label><input  tabindex="0"id="LogPass" type="password" class="LogInput" value="" />
        </div>

So I'm passing LogName and LogPass with the post.

There is also a logform.js with this bit of code

$("#LogButton").click(function()
        {   //$('#divLogForm').hide();
            //$('#divLoading').show();  

           var uName = $("#LogName").val();
           var uPass = $("#LogPass").val();
           var url = "/index.cfm";
           $.post(url, {ZACTION:'AJAX',ZMETHOD:'LOGIN',func:'LOGIN',USERNAME:uName, USERPASS:uPass}, 
                  function(data){if (data.isOk =="YES"){location.href="/index.cfm";}
                                  else {$('.BadText').show(); $('#BadText').focus();};
                                 },"json");
        });

The LoginURL in my code is taken from the var url in this script. I have tried using USERNAME & USERPASS and I have tried uName and uPass with my post but these didnt work either.

Not sure how to move forward here. Any help is greatly appreciated

解决方案

The last bit of javascript you posted gives a clue as to why your login POST request isn't working.

According to the javascript, you should be sending a dictionary that looks like the following with your login POST:

{
    'ZACTION': 'AJAX',
    'ZMETHOD': 'LOGIN',
    'func': 'LOGIN',
    'USERNAME': '<enter username>',
    'USERPASS': '<enter password>'
}, 

这篇关于使用请求登录到具有 javascript 登录表单的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆