使用python3 request.Session()进行抓取抓取 [英] Flurry scraping using python3 requests.Session()
问题描述
这似乎很简单,但是由于某种原因,这无法正确连接到数据中心,因此我无法抓取数据.
This seems really straight forward, but for some reason this isn't connecting to flurry correctly and I unable to scrape the data.
loginurl = "https://dev.flurry.com/secure/loginPage.do"
csvurl = "https://dev.flurry.com/eventdata"
session = requests.Session()
login = session.post(loginurl, data={'loginEmail': 'user', 'loginPassword': 'pass'})
data = session.get(csvurl)
每次尝试使用此功能时,都会重定向回登录屏幕(loginurl),而不会获取新数据.以前有没有人能够像这样成功地连接到绒毛?
Every time I try to use this, I get redirected back to the login screen (loginurl) without fetching the new data. Has anyone been able to connect to flurry like this successfully before?
任何人和所有帮助将不胜感激,谢谢.
Any and all help would be greatly appreciated, thanks.
推荐答案
还有两个要填充的表单字段 struts.token.name 和来自 struts.token.name的值,即 token ,您还必须发布到 loginAction.do :
There are two more form fields to be populated struts.token.name and the value from struts.token.name i.e token, you also have to post to loginAction.do:
您可以进行初始获取并使用 bs4 解析值,然后发布数据:
You can do an initial get and parse the values using bs4 then post the data:
from bs4 import BeautifulSoup
import requests
loginurl = "https://dev.flurry.com/secure/loginAction.do"
csvurl = "https://dev.flurry.com/eventdata"#
data = {'loginEmail': 'user', 'loginPassword': 'pass'}
with requests.Session() as session:
session.headers.update({
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})
soup = BeautifulSoup(session.get(loginurl).content)
name = soup.select_one("input[name=struts.token.name]")["value"]
data["struts.token.name"] = name
data[name] = soup.select_one("input[name={}]".format(name))["value"]
login = session.post(loginurl, data=data)
这篇关于使用python3 request.Session()进行抓取抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!