通过公共HTTP将大文件自动检索到Google云端存储 [英] Automatically retrieving large files via public HTTP into Google Cloud Storage
问题描述
这些文件在公共HTTP URL(为了达到天气处理目的,我希望能够自动检索Google Cloud Storage中的每日天气预报数据。 =http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home =nofollow noreferrer> http://dcpc-nwp.meteo.fr/openwis-user -portal / srv / en / main.home ),但它们非常大(介于30和300兆字节之间)。文件大小是主要问题。
在查看以前的stackoverflow主题后,我尝试了两种不成功的方法:
from google.appengine.api import urlfetch
url =http://dcpc-nwp.meteo.fr/servic ...
result = urlfetch.fetch(url)
[.. 。]#保存在Google Cloud Storage存储桶中的代码
但是,我在urlfetch行中收到以下错误消息:
DeadlineExceededError:在等待来自URL的HTTP响应时超出了截止时间
$ b
2 /秒根据文档,可以直接通过云存储传输服务将HTTP数据检索到云存储:$($ $ $ $ $ $ $ $ $ $ $ $ $ $) b $ b https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata
但它要求大小和下载之前的文件的md5。这个选项在我的情况下是行不通的,因为网站不提供这些信息。
3 /有任何想法?
您是否看到任何解决方案可以将HTTP上的大文件自动检索到我的Cloud Storage存储桶中?
>使用计算引擎实例的解决方法
由于无法通过App Engine或通过Cloud直接从外部HTTP检索大文件存储,我已经使用了一个总是运行的计算引擎实例的解决方法。
这个实例定期检查新的天气文件是否可用,下载并将它们上传到云存储桶。
<对于可扩展性,维护和成本方面的原因,我宁愿只使用无服务器服务,但希望:
- 新鲜的f1-micro计算引擎实例(无需额外的软件包,如果全天候运行,每月只需支付4美元)
- 从计算引擎到Google云端存储的网络流量是免费的,和桶在同一地区(0美元/月)
For weather processing purpose, I am looking to retrieve automatically daily weather forecast data in Google Cloud Storage.
The files are available on public HTTP URL (http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home), but they are very large (between 30 and 300 Megabytes). Size of files is the main issue.
After looking at previous stackoverflow topics, I have tried two unsuccessful methods:
1/ First attempt via urlfetch in Google App Engine
from google.appengine.api import urlfetch url = "http://dcpc-nwp.meteo.fr/servic..." result = urlfetch.fetch(url) [...] # Code to save in a Google Cloud Storage bucket
But I get the following error message on the urlfetch line :
DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL
2/ Second attempt via the Cloud Storage Transfert Service
According to the documentation, it is possible to retrieve HTTP Data into Cloud Storage directly via the Cloud Storage Transfert Service : https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata
But it requires the size and md5 of the files before the download. This option cannot work in my case because the website does not provide those information.
3/ Any ideas ?
Do you see any solution to retrieve automatically large file on HTTP into my Cloud Storage bucket?
3/ Workaround with a Compute Engine instance
Since it was not possible to retrieve large files from external HTTP with App Engine or directly with Cloud Storage, I have used a workaround with an always-running Compute Engine instance.
This instance regularly checks if new weather files are available, downloads them and uploads them to a Cloud Storage bucket.
For scalability, maintenance and cost reasons, I would have prefered to use only serverless services, but hopefully :
- It works well on a fresh f1-micro Compute Engine instance (no extra package required and only 4$/month if running 24/7)
- The network traffic from Compute Engine to Google Cloud Storage is free if the instance and the bucket are in the same region (0$/month)
这篇关于通过公共HTTP将大文件自动检索到Google云端存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!