读取非常大的Blob而不将其下载到Google Cloud中(流式传输?) [英] Reading really big blobs without downloading them in Google Cloud (streaming?)

查看：146 发布时间：2020/11/18 20:22:01 python stream google-cloud-platform prefetch

本文介绍了读取非常大的Blob而不将其下载到Google Cloud中(流式传输?)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请帮助！

[+]我所拥有的: 每个存储桶中都有很多斑点. Blob的大小可以变化，从小于千字节到大量千兆字节.

[+] What I have: A lot of blobs in every bucket. Blobs can vary in size from being less than a Kilo-byte to being lots of Giga-bytes.

[+]我正在尝试做的事情: 我需要能够在那些Blob中流传输数据(例如大小为1024的缓冲区或类似的东西)，或者在Python中按一定大小的块读取它们.关键是我不认为我只能执行bucket.get_blob()，因为如果blob是TeraByte，那么我将无法在物理内存中存储它.

[+] What I'm trying to do: I need to be able to either stream the data in those blobs (like a buffer of size 1024 or something like that) or read them by chunks of a certain size in Python. The point is I don't think I can just do a bucket.get_blob() because if the blob was a TeraByte then I wouldn't be able to have it in physical memory.

[+]我实际上正在尝试做的事情: 解析Blob中的信息以识别关键字

[+] What I'm really trying to do: parse the information inside the blobs to identify key-words

[+]我已阅读的内容: 大量有关如何分块写入Google Cloud，然后使用compose将其缝合在一起的文档(完全没有帮助)

[+] What I've read: A lot of documentation on how to write to google cloud in chunks and then use compose to stitch it together (not helpful at all)

关于Java的预取功能的大量文档(需要是python)

A lot of documentation on java's pre-fetch functions (needs to be python)

google cloud API的

The google cloud API's

如果有人能指出正确的方向，我将不胜感激！谢谢

If anyone could point me the right direction I would be really grateful! Thanks

推荐答案

所以我发现这样做的一种方法是在python中创建一个类似文件的对象，然后使用Google-Cloud API调用.download_to_file()类似于文件的对象.

So a way I have found of doing this is by creating a file-like object in python then using the Google-Cloud API call .download_to_file() with that file-like object.

这本质上是流数据. python代码看起来像这样

This in essence streams data. python code looks something like this

def getStream(blob):
    stream = open('myStream','wb', os.O_NONBLOCK)
    streaming = blob.download_to_file(stream)

使用os.O_NONBLOCK标志可以在写入文件时进行读取. 我仍未使用大型文件进行测试，因此，如果有人知道更好的实现方法，或者看到这样做有潜在的失败，请发表评论. 谢谢！

The os.O_NONBLOCK flag is so I can read while I'm writing to the file. I still haven't tested this with really big files so if anyone knows a better implementation or see's a potential failure with this please comment. Thanks!

这篇关于读取非常大的Blob而不将其下载到Google Cloud中(流式传输?)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

读取非常大的Blob而不将其下载到Google Cloud中(流式传输?) [英] Reading really big blobs without downloading them in Google Cloud (streaming?)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

读取非常大的Blob而不将其下载到Google Cloud中(流式传输?) [英] Reading really big blobs without downloading them in Google Cloud (streaming?)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭