Dataflow是否利用Google Cloud Storage的gzip转码功能? [英] Is Dataflow making use of Google Cloud Storage's gzip transcoding?

查看：89 发布时间：2020/11/18 1:43:11 google-cloud-dataflow

本文介绍了Dataflow是否利用Google Cloud Storage的gzip转码功能?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试处理JSON文件(10 GB未压缩/2 GB压缩)，并且我想优化管道.

I am trying to process JSON files (10 GB uncompressed/2 GB compressed) and I want to optimize my pipeline.

根据官方文档，Google云存储(GCS)可以选择对gzip文件进行转码，这意味着当正确标记了gzip文件后，应用程序会将其解压缩. Google Cloud Dataflow(GCDF)在处理未压缩文件时具有更好的并行性，因此我想知道是否在GCS上设置 meta标签对性能有积极影响吗?

According to the official docs Google Cloud Storage (GCS) has the option to transcode gzip files, which means the application gets them uncompressed, when they are tagged correctly. Google Cloud Dataflow (GCDF) has better parallelism when dealing with uncompressed files, so I was wondering if setting the meta tag on GCS has a positive effect on performance?

由于我的输入文件相对较大，因此解压缩它们是否有意义，以便Dataflow将它们分成较小的块?

Since my input files are relatively large, does it make sense to unzip them so that Dataflow splits them in smaller chunks?

Dataflow是否利用Google Cloud Storage的gzip转码功能? [英] Is Dataflow making use of Google Cloud Storage's gzip transcoding?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Dataflow是否利用Google Cloud Storage的gzip转码功能? [英] Is Dataflow making use of Google Cloud Storage&#39;s gzip transcoding?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Dataflow是否利用Google Cloud Storage的gzip转码功能? [英] Is Dataflow making use of Google Cloud Storage's gzip transcoding?

登录关闭