Python-登录并从网站下载特定文件 [英] Python - Login and download specific file from website

查看:1796
本文介绍了Python-登录并从网站下载特定文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试登录网站并下载特定文件的尝试失败了.

My attempt to log into a website and download a specific file has hit a fall.

具体来说,我正在登录该网站

Specifically, I am logging into this website http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0

为了在下载文件并将其另存为excel或csv之前可以选择特定的变量和参数.

in order so that I can select specific variables and parameters before I download the file and save as an excel or csv.

尤其是,我想切换突出显示的输入,然后在可视化和下载"下下载文件之前,选择作物的类型,供水,输入水平,时间段和地理区域. 按钮.

In particular, I want to toggle the highlighted inputs , before selecting the type of crop, water supply, input level, time period, and geographic areas before downloading the file under 'Visualization and Download' button.

例如,我想要获取美国(地理区域)的小麦(作物),雨养(供水),高(投入水平),1961-1990年(时间段,基准)的数据).然后我想将其另存为Excel文件.

For example, I would like to get the data for Wheat (Crop), rain-fed (Water Supply), High (Input Level), 1961-1990 (Time Period, Baseline), United States of America (Geographic Areas). Then I want to save it as an excel file.

到目前为止,这是我的代码:

This is my code so far:

# Import library
import requests

# Define url, username, and password
url = 'http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0'
user, password = 'Username', 'Password'
resp = requests.get(url, auth=(user, password))

也许我在整个过程中都非常渴望看到一个简单,可行的解决方案,但是任何帮助都将不胜感激.

Perhaps I'm ingrained in the trenches of the entire process to see an easy, viable solution, but any help is greatly appreciated.

推荐答案

您链接的网站使用基于HTTP POST的登录信息.在您的代码中,您具有:

Website that you linked uses HTTP POST based login from. In your code you have:

resp = requests.get(url, auth=(user, password))

将使用基本的http身份验证 http://docs .python-requests.org/en/master/user/authentication/#basic-authentication

which will use basic http authentication http://docs.python-requests.org/en/master/user/authentication/#basic-authentication

要登录该网站,您需要做两件事:

To login to this site you need two things:

  • 持久会话cookie
  • 用于登录表单URL的HTTP POST请求

首先,让我们创建将保存来自服务器 http://docs.python-requests.org/en/master/user/advanced/#session-objects

First of all let's create session object that will be holding cookies form server http://docs.python-requests.org/en/master/user/advanced/#session-objects

s = requests.Session()

接下来,您需要使用GET请求访问网站.这将为您生成cookie(服务器将为您的会话发送cookie).

Next you need to visit site using GET request. This will generate cookie for you (server will send cookie for your session).

s.get(site_url)

最后一步将是登录到站点.您可以使用Firebug或Chrome开发者控制台(取决于您使用的浏览器)来检查需要发送哪些字段(转到网络"标签).

Final step will be to login to site. You can use Firebug or Chrome Developer Console (depending of what browser you use) to examine what fields needs to be send (Go to Network tab).

s.post(site_url, data={'_username': 'user', '_password': 'pass'})

这两个字段(_username,_password)似乎对您的站点有效,但是当我检查在POST请求期间发送的数据时,还有更多字段.我不知道它们是否必要.

This two fields (_username, _password) seems to be valid for your site, but as I examine data which was send during POST request, there were more fields. I don't know if they are necessary.

之后,您将通过身份验证.接下来就是访问您要下载的文件的URL.

After that you will be authenticated. Next thing will be to visit URL for file you would like to download.

s.get(file_url)

您提供的链接包含带有各种选项的查询字符串,这些选项可能与您要突出显示的选项有关.您可以使用它来下载具有所需选项的文件.

The link you provided contains query string with various options that are related probably to options you want to be highlighted. You can use it to download file with desired options.

请注意,此站点未使用HTTPS安全连接.您将提供的所有凭据将未经加密地通过互联网,并且可能被不应该看到的人看到.

Note that this site is not using HTTPS secure connection. Any credentials you will provide will go through the internet unencrypted and can be potentially see by someone who should not see them.

这篇关于Python-登录并从网站下载特定文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆