读取非常大的Blob而不将其下载到Google Cloud中(流式传输?) [英] Reading really big blobs without downloading them in Google Cloud (streaming?)

查看:146
本文介绍了读取非常大的Blob而不将其下载到Google Cloud中(流式传输?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请帮助!

[+]我所拥有的: 每个存储桶中都有很多斑点. Blob的大小可以变化,从小于千字节到大量千兆字节.

[+] What I have: A lot of blobs in every bucket. Blobs can vary in size from being less than a Kilo-byte to being lots of Giga-bytes.

[+]我正在尝试做的事情: 我需要能够在那些Blob中流传输数据(例如大小为1024的缓冲区或类似的东西),或者在Python中按一定大小的块读取它们.关键是我不认为我只能执行bucket.get_blob(),因为如果blob是TeraByte,那么我将无法在物理内存中存储它.

[+] What I'm trying to do: I need to be able to either stream the data in those blobs (like a buffer of size 1024 or something like that) or read them by chunks of a certain size in Python. The point is I don't think I can just do a bucket.get_blob() because if the blob was a TeraByte then I wouldn't be able to have it in physical memory.

[+]我实际上正在尝试做的事情: 解析Blob中的信息以识别关键字

[+] What I'm really trying to do: parse the information inside the blobs to identify key-words

[+]我已阅读的内容: 大量有关如何分块写入Google Cloud,然后使用compose将其缝合在一起的文档(完全没有帮助)

[+] What I've read: A lot of documentation on how to write to google cloud in chunks and then use compose to stitch it together (not helpful at all)

关于Java的预取功能的大量文档(需要是python)

A lot of documentation on java's pre-fetch functions (needs to be python)

google cloud API的

The google cloud API's

如果有人能指出正确的方向,我将不胜感激! 谢谢

If anyone could point me the right direction I would be really grateful! Thanks

推荐答案

所以我发现这样做的一种方法是在python中创建一个类似文件的对象,然后使用Google-Cloud API调用.download_to_file()类似于文件的对象.

So a way I have found of doing this is by creating a file-like object in python then using the Google-Cloud API call .download_to_file() with that file-like object.

这本质上是流数据. python代码看起来像这样

This in essence streams data. python code looks something like this

def getStream(blob):
    stream = open('myStream','wb', os.O_NONBLOCK)
    streaming = blob.download_to_file(stream)

使用os.O_NONBLOCK标志可以在写入文件时进行读取. 我仍未使用大型文件进行测试,因此,如果有人知道更好的实现方法,或者看到这样做有潜在的失败,请发表评论. 谢谢!

The os.O_NONBLOCK flag is so I can read while I'm writing to the file. I still haven't tested this with really big files so if anyone knows a better implementation or see's a potential failure with this please comment. Thanks!

这篇关于读取非常大的Blob而不将其下载到Google Cloud中(流式传输?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆