Python - 从网站登录并下载特定文件 [英] Python - Login and download specific file from website

查看:89
本文介绍了Python - 从网站登录并下载特定文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我登录网站并下载特定文件的尝试失败了.

具体来说,我正在登录这个网站 ,在选择作物类型、供水、输入水平、时间段和地理区域之前,在可视化和下载"下下载文件之前' 按钮.

例如,我想获取小麦(作物)、雨养(供水)、高(投入水平)、1961-1990(时间段、基线)、美利坚合众国(地理区域)的数据).然后我想把它保存为一个excel文件.

这是我目前的代码:

# 导入库进口请求# 定义url、用户名和密码url = 'http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0'用户,密码 = '用户名','密码'resp = requests.get(url, auth=(user, password))

也许我在整个过程中根深蒂固,希望找到一个简单可行的解决方案,但非常感谢任何帮助.

解决方案

您链接的网站使用基于 HTTP POST 的登录方式.在您的代码中,您有:

resp = requests.get(url, auth=(user, password))

将使用基本的 http 身份验证 http://docs.python-requests.org/en/master/user/authentication/#basic-authentication

要登录此站点,您需要做两件事:

  • 持久会话 cookie
  • 对登录表单 URL 的 HTTP POST 请求

首先让我们创建会话对象,该对象将保存来自服务器的 cookie http://docs.python-requests.org/en/master/user/advanced/#session-objects

s = requests.Session()

接下来您需要使用 GET 请求访问站点.这将为您生成 cookie(服务器将为您的会话发送 cookie).

s.get(site_url)

最后一步是登录网站.您可以使用 Firebug 或 Chrome 开发者控制台(取决于您使用的浏览器)来检查需要发送的字段(转到网络"选项卡).

s.post(site_url, data={'_username': 'user', '_password': 'pass'})

这两个字段(_username、_password)似乎对您的站点有效,但是当我检查在 POST 请求期间发送的数据时,还有更多字段.我不知道它们是否有必要.

之后,您将通过身份验证.接下来是访问您要下载的文件的 URL.

s.get(file_url)

您提供的链接包含带有各种选项的查询字符串,这些选项可能与您要突出显示的选项有关.您可以使用它来下载具有所需选项的文件.

警告说明

请注意,此站点未使用 HTTPS 安全连接.您提供的任何凭据都将在未加密的情况下通过互联网,并且可能会被不应该看到它们的人看到.

My attempt to log into a website and download a specific file has hit a fall.

Specifically, I am logging into this website http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0

in order so that I can select specific variables and parameters before I download the file and save as an excel or csv.

In particular, I want to toggle the highlighted inputs , before selecting the type of crop, water supply, input level, time period, and geographic areas before downloading the file under 'Visualization and Download' button.

For example, I would like to get the data for Wheat (Crop), rain-fed (Water Supply), High (Input Level), 1961-1990 (Time Period, Baseline), United States of America (Geographic Areas). Then I want to save it as an excel file.

This is my code so far:

# Import library
import requests

# Define url, username, and password
url = 'http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0'
user, password = 'Username', 'Password'
resp = requests.get(url, auth=(user, password))

Perhaps I'm ingrained in the trenches of the entire process to see an easy, viable solution, but any help is greatly appreciated.

解决方案

Website that you linked uses HTTP POST based login from. In your code you have:

resp = requests.get(url, auth=(user, password))

which will use basic http authentication http://docs.python-requests.org/en/master/user/authentication/#basic-authentication

To login to this site you need two things:

  • persistent session cookie
  • HTTP POST request to login form URL

First of all let's create session object that will be holding cookies form server http://docs.python-requests.org/en/master/user/advanced/#session-objects

s = requests.Session()

Next you need to visit site using GET request. This will generate cookie for you (server will send cookie for your session).

s.get(site_url)

Final step will be to login to site. You can use Firebug or Chrome Developer Console (depending of what browser you use) to examine what fields needs to be send (Go to Network tab).

s.post(site_url, data={'_username': 'user', '_password': 'pass'})

This two fields (_username, _password) seems to be valid for your site, but as I examine data which was send during POST request, there were more fields. I don't know if they are necessary.

After that you will be authenticated. Next thing will be to visit URL for file you would like to download.

s.get(file_url)

The link you provided contains query string with various options that are related probably to options you want to be highlighted. You can use it to download file with desired options.

Warning Note

Note that this site is not using HTTPS secure connection. Any credentials you will provide will go through the internet unencrypted and can be potentially see by someone who should not see them.

这篇关于Python - 从网站登录并下载特定文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆