通过公共HTTP将大文件自动检索到Google云端存储 [英] Automatically retrieving large files via public HTTP into Google Cloud Storage

查看:178
本文介绍了通过公共HTTP将大文件自动检索到Google云端存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



这些文件在公共HTTP URL(为了达到天气处理目的,我希望能够自动检索Google Cloud Storage中的每日天气预报数据。 =http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home =nofollow noreferrer> http://dcpc-nwp.meteo.fr/openwis-user -portal / srv / en / main.home ),但它们非常大(介于30和300兆字节之间)。文件大小是主要问题。



在查看以前的stackoverflow主题后,我尝试了两种不成功的方法:

 
from google.appengine.api import urlfetch

url =http://dcpc-nwp.meteo.fr/servic ...
result = urlfetch.fetch(url)

[.. 。]#保存在Google Cloud Storage存储桶中的代码

但是,我在urlfetch行中收到以下错误消息:



DeadlineExceededError:在等待来自URL的HTTP响应时超出了截止时间
$ b

2 /秒根据文档,可以直接通过云存储传输服务将HTTP数据检索到云存储:$($ $ $ $ $ $ $ $ $ $ $ $ $ $) b $ b https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata



但它要求大小和下载之前的文件的md5。这个选项在我的情况下是行不通的,因为网站不提供这些信息。



3 /有任何想法?



您是否看到任何解决方案可以将HTTP上的大文件自动检索到我的Cloud Storage存储桶中?

>使用计算引擎实例的解决方法

由于无法通过App Engine或通过Cloud直接从外部HTTP检索大文件存储,我已经使用了一个总是运行的计算引擎实例的解决方法。



这个实例定期检查新的天气文件是否可用,下载并将它们上传到云存储桶。



<对于可扩展性,维护和成本方面的原因,我宁愿只使用无服务器服务,但希望:


  • 新鲜的f1-micro计算引擎实例(无需额外的软件包,如果全天候运行,每月只需支付4美元)

  • 从计算引擎到Google云端存储的网络流量是免费的,和桶在同一地区(0美元/月)


For weather processing purpose, I am looking to retrieve automatically daily weather forecast data in Google Cloud Storage.

The files are available on public HTTP URL (http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home), but they are very large (between 30 and 300 Megabytes). Size of files is the main issue.

After looking at previous stackoverflow topics, I have tried two unsuccessful methods:

1/ First attempt via urlfetch in Google App Engine

    from google.appengine.api import urlfetch

    url = "http://dcpc-nwp.meteo.fr/servic..."
    result = urlfetch.fetch(url)

    [...] # Code to save in a Google Cloud Storage bucket

But I get the following error message on the urlfetch line :

DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL

2/ Second attempt via the Cloud Storage Transfert Service

According to the documentation, it is possible to retrieve HTTP Data into Cloud Storage directly via the Cloud Storage Transfert Service : https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata

But it requires the size and md5 of the files before the download. This option cannot work in my case because the website does not provide those information.

3/ Any ideas ?

Do you see any solution to retrieve automatically large file on HTTP into my Cloud Storage bucket?

解决方案

3/ Workaround with a Compute Engine instance

Since it was not possible to retrieve large files from external HTTP with App Engine or directly with Cloud Storage, I have used a workaround with an always-running Compute Engine instance.

This instance regularly checks if new weather files are available, downloads them and uploads them to a Cloud Storage bucket.

For scalability, maintenance and cost reasons, I would have prefered to use only serverless services, but hopefully :

  • It works well on a fresh f1-micro Compute Engine instance (no extra package required and only 4$/month if running 24/7)
  • The network traffic from Compute Engine to Google Cloud Storage is free if the instance and the bucket are in the same region (0$/month)

这篇关于通过公共HTTP将大文件自动检索到Google云端存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆