使用python3 request.Session()进行抓取抓取 [英] Flurry scraping using python3 requests.Session()

查看:1734
本文介绍了使用python3 request.Session()进行抓取抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎很简单,但是由于某种原因,这无法正确连接到数据中心,因此我无法抓取数据.

This seems really straight forward, but for some reason this isn't connecting to flurry correctly and I unable to scrape the data.

    loginurl = "https://dev.flurry.com/secure/loginPage.do"
    csvurl = "https://dev.flurry.com/eventdata"

    session = requests.Session()
    login = session.post(loginurl, data={'loginEmail': 'user', 'loginPassword': 'pass'})
    data = session.get(csvurl)

每次尝试使用此功能时,都会重定向回登录屏幕(loginurl),而不会获取新数据.以前有没有人能够像这样成功地连接到绒毛?

Every time I try to use this, I get redirected back to the login screen (loginurl) without fetching the new data. Has anyone been able to connect to flurry like this successfully before?

任何人和所有帮助将不胜感激,谢谢.

Any and all help would be greatly appreciated, thanks.

推荐答案

还有两个要填充的表单字段 struts.token.name 和来自 struts.token.name的值,即 token ,您还必须发布到 loginAction.do :

There are two more form fields to be populated struts.token.name and the value from struts.token.name i.e token, you also have to post to loginAction.do:

您可以进行初始获取并使用 bs4 解析值,然后发布数据:

You can do an initial get and parse the values using bs4 then post the data:

from bs4 import BeautifulSoup
import requests 

loginurl = "https://dev.flurry.com/secure/loginAction.do"
csvurl = "https://dev.flurry.com/eventdata"#
data = {'loginEmail': 'user', 'loginPassword': 'pass'}

with requests.Session() as session:
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})

    soup = BeautifulSoup(session.get(loginurl).content)
    name = soup.select_one("input[name=struts.token.name]")["value"]
    data["struts.token.name"] = name
    data[name] = soup.select_one("input[name={}]".format(name))["value"]
    login = session.post(loginurl, data=data)

这篇关于使用python3 request.Session()进行抓取抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆