通过公共HTTP将大文件自动检索到Google云端存储 [英] Automatically retrieving large files via public HTTP into Google Cloud Storage

查看：178 发布时间：2018/5/4 11:28:06 python google-app-engine google-cloud-storage

本文介绍了通过公共HTTP将大文件自动检索到Google云端存储的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这些文件在公共HTTP URL（为了达到天气处理目的，我希望能够自动检索Google Cloud Storage中的每日天气预报数据。 =http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home =nofollow noreferrer> http://dcpc-nwp.meteo.fr/openwis-user -portal / srv / en / main.home ），但它们非常大（介于30和300兆字节之间）。文件大小是主要问题。

在查看以前的stackoverflow主题后，我尝试了两种不成功的方法：

 
 from google.appengine.api import urlfetch 
 
 url =http：//dcpc-nwp.meteo.fr/servic ...
 result = urlfetch.fetch（url）
 
 [.. 。]＃保存在Google Cloud Storage存储桶中的代码

但是，我在urlfetch行中收到以下错误消息：

DeadlineExceededError：在等待来自URL的HTTP响应时超出了截止时间
$ b

2 /秒根据文档，可以直接通过云存储传输服务将HTTP数据检索到云存储：$（$ $ $ $ $ $ $ $ $ $ $ $ $ $） b $ b https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata

但它要求大小和下载之前的文件的md5。这个选项在我的情况下是行不通的，因为网站不提供这些信息。

3 /有任何想法？

您是否看到任何解决方案可以将HTTP上的大文件自动检索到我的Cloud Storage存储桶中？
>使用计算引擎实例的解决方法

由于无法通过App Engine或通过Cloud直接从外部HTTP检索大文件存储，我已经使用了一个总是运行的计算引擎实例的解决方法。

这个实例定期检查新的天气文件是否可用，下载并将它们上传到云存储桶。

<对于可扩展性，维护和成本方面的原因，我宁愿只使用无服务器服务，但希望：

新鲜的f1-micro计算引擎实例（无需额外的软件包，如果全天候运行，每月只需支付4美元）

从计算引擎到Google云端存储的网络流量是免费的，和桶在同一地区（0美元/月）

For weather processing purpose, I am looking to retrieve automatically daily weather forecast data in Google Cloud Storage.

The files are available on public HTTP URL (http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home), but they are very large (between 30 and 300 Megabytes). Size of files is the main issue.

After looking at previous stackoverflow topics, I have tried two unsuccessful methods:

1/ First attempt via urlfetch in Google App Engine
from google.appengine.api import urlfetch url = "http://dcpc-nwp.meteo.fr/servic..." result = urlfetch.fetch(url) [...] # Code to save in a Google Cloud Storage bucket
But I get the following error message on the urlfetch line :

DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL

2/ Second attempt via the Cloud Storage Transfert Service

According to the documentation, it is possible to retrieve HTTP Data into Cloud Storage directly via the Cloud Storage Transfert Service : https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata

But it requires the size and md5 of the files before the download. This option cannot work in my case because the website does not provide those information.

3/ Any ideas ?

Do you see any solution to retrieve automatically large file on HTTP into my Cloud Storage bucket?
解决方案
3/ Workaround with a Compute Engine instance

Since it was not possible to retrieve large files from external HTTP with App Engine or directly with Cloud Storage, I have used a workaround with an always-running Compute Engine instance.

This instance regularly checks if new weather files are available, downloads them and uploads them to a Cloud Storage bucket.

For scalability, maintenance and cost reasons, I would have prefered to use only serverless services, but hopefully :

It works well on a fresh f1-micro Compute Engine instance (no extra package required and only 4$/month if running 24/7)

The network traffic from Compute Engine to Google Cloud Storage is free if the instance and the bucket are in the same region (0$/month)

这篇关于通过公共HTTP将大文件自动检索到Google云端存储的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过公共HTTP将大文件自动检索到Google云端存储 [英] Automatically retrieving large files via public HTTP into Google Cloud Storage

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

通过公共HTTP将大文件自动检索到Google云端存储 [英] Automatically retrieving large files via public HTTP into Google Cloud Storage

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭