使用Twill从登录页面Python抓取.txt [英] Using Twill to grab .txt from login page Python
问题描述
我正在使用Twill来检索包含所需的.txt数据的页面,因此我可以将它们存储为Excel文件.数据受密码保护,因此我要从/user/login
页登录.
I'm using Twill to retrieve pages that contain wanted .txt data on them so I can store them as an Excel file. The data is password protected so I'm logging in from the /user/login
page.
我的代码遇到了一个问题,它试图从登录屏幕访问文本页面,并且碰到了HTML的砖墙,而不是.txt本身.
My code runs into the problem where it tries to access the text page from the login screen and hits a brick wall of HTML rather than the .txt itself.
当我运行登录名时:
path = "https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/"
end = "td.txt"
go("http://www.naturalgasintel.com/user/login")
showforms()
fv("2", "user[email]", user_email)
fv("2", "user[password]", user_password)
fv("2", "commit", "Login")
datafilelocation = path + year + "/" + month + "/" + date + end
go(datafilelocation)
当我的代码到达go(datafilelocation)
时,我得到了:
When my code gets to go(datafilelocation)
I get this:
==> at https://www.naturalgasintel.com/user/login?referer=%2Fext%2Fresources%2FData-Feed%2FDaily-GPI%2F2018%2F12%2F20181221td.txt
Out[18]: u'https://www.naturalgasintel.com/user/login?referer=%2Fext%2Fresources%2FData-Feed%2FDaily-GPI%2F2018%2F12%2F20181221td.txt'
所以当我真的想进入页面时,它指向的是referer
而不是实际的文本:
So it points to the referer
rather than the actual text when I really want to get to the page:
https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/12/20181221td.txt
我使用fv("2", "commit", "Login")
而不是submit()
的原因是,当我进入页面时会看到以下信息:
The reason I used fv("2", "commit", "Login")
instead of submit()
is that when I get to the page it gives me this:
showforms()
Form name=quick-search (#1)
## ## __Name__________________ __Type___ __ID________ __Value__________________
1 q text q Search
Form #2
## ## __Name__________________ __Type___ __ID________ __Value__________________
1 utf8 hidden (None) ✓
2 authenticity_token hidden (None) pnFnPGhMomX2Lyh7/U8iGOZKsiQnyicj7BWT ...
3 referer hidden (None) https://www.naturalgasintel.com/ext/ ...
4 popup hidden (None) false
5 user[email] text user_email
6 user[password] password user_pas ...
7 user[remember_me] hidden (None) 0
8 user[remember_me] checkbox user_rem ... None
9 commit submit (None) Login
然后在我submit()
之后告诉我:
Note: submit is using submit button: name="commit", value="Login"
解决此问题的最佳解决方案是什么?
What is the best solution to solve this issue?
推荐答案
如果可以使用Mechanize代替Twill,可以尝试以下操作:
If you'd be fine using Mechanize instead of Twill, give the following a shot:
import mechanize
username = ""
password = ""
login_post_url = "http://www.naturalgasintel.com/user/login"
internal_url = "https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/12/20181221td.txt"
browser = mechanize.Browser()
browser.open(login_post_url)
browser.select_form(nr = 1)
browser.form['user[email]'] = username
browser.form['user[password]'] = password
browser.submit()
response = browser.open(internal_url)
print response.read()
这篇关于使用Twill从登录页面Python抓取.txt的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!