在云数据流期间读取视频,使用GCSfuse,本地下载或编写新的Beam阅读器? [英] Reading video during cloud dataflow, using GCSfuse, download locally, or write new Beam reader?

查看:123
本文介绍了在云数据流期间读取视频,使用GCSfuse,本地下载或编写新的Beam阅读器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个python云视频管道,该管道将从桶中读取视频,执行一些计算机视觉分析并将帧返回到桶中.据我所知,没有Beam读取方法可以将GCS路径传递给opencv,类似于TextIO.read().我向前迈进的选择似乎是在本地下载文件(它们很大),使用GCS保险丝安装在本地工作程序上(可能吗?)或编写自定义的源方法.任何人都有最有意义的经验吗?

I am building a python cloud video pipeline that will read video from a bucket, perform some computer vision analysis and return frames back to a bucket. As far as I can tell, there is not a Beam read method to pass GCS paths to opencv, similar to TextIO.read(). My options moving forward seem to download the file locally (they are large), use GCS fuse to mount on a local worker (possible?) or write a custom source method. Anyone have experience on what makes most sense?

我的主要困惑是这里的问题

My main confusion was this question here

可以谷歌云数据流(Apache Beam)使用ffmpeg处理视频或图像数据

ffmpeg如何访问该路径?不仅仅是上传二进制文件的问题吗?需要通过Beam方法传递项目,对吗?

How would ffmpeg have access to the path? Its not just a question of uploading the binary? There needs to be a Beam method to pass the item, correct?

推荐答案

我认为您需要先下载文件,然后再传递它们.

I think that you will need to download the files first and then pass them through.

但是,不是将文件保存在本地,而是可以将字节传递到opencv.它接受任何类型的ByteStream或输入流吗?

However instead of saving the files locally, is it possible to pass bytes through to opencv. Does it accept any sort of ByteStream or input stream?

您可能有一个ParDo,该ParDo使用GCS API下载文件,然后通过流

You could have one ParDo which downloads the files using the GCS API, then passes it to a opencv through a stream, ByteChannel stdin pipe, etc.

如果该选项不可用,则需要将文件保存到本地磁盘.然后传递opencv文件名.这可能很棘手,因为您可能最终会占用过多的磁盘空间.因此,请确保在opencv处理完文件后正确地垃圾收集文件并从本地磁盘中删除文件.

If that is not available, you will need to save the files to disk locally. Then pass opencv the filename. This could be tricky because you may end up using too much disk space. So make sure to garbage collect the files properly and delete the files from local disk after opencv processes them.

我不确定,但是您可能还需要选择某种VM计算机类型,以确保有足够的磁盘空间,具体取决于文件的大小.

I'm not sure but you may need to also select a certain VM machine type to ensure you have enough disk space, depending on the size of your files.

这篇关于在云数据流期间读取视频,使用GCSfuse,本地下载或编写新的Beam阅读器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆