相对于使用Google数据流保存在Google存储中的普通文本文件读取压缩文件时，性能相对较差 [英] Relatively poor performance when reading compressed files vis a vis normal text files kept in google storage using google dataflow

查看：91 发布时间：2020/11/18 2:02:50 google-cloud-storage google-cloud-dataflow

本文介绍了相对于使用Google数据流保存在Google存储中的普通文本文件读取压缩文件时，性能相对较差的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用Google数据流从云存储中读取了11.57GB的文件，并将其写入了Google BigQuery. 30名工人花了大约12分钟的时间.

I used google dataflow to read an 11.57GB file from cloud storage and wrote them to google BigQuery. It took around 12 mins with 30 workers.

然后我压缩了相同的文件(大小现在变为1.06GB)，然后再次使用google dataflow从google存储中读取它们并将其写入BigQuery.现在，与30名工人一起花了大约31分钟.

I then compressed the same file(size now became 1.06GB) and then again read them from google storage using google dataflow and wrote them to BigQuery. It now took around 31 mins with same 30 workers.

这两个数据流作业都具有相同的管道选项，除了第一个数据流作业中的输入文件未压缩，而第二个数据流作业中的输入文件已压缩.

Both the dataflow jobs had same pipeline options except the input file in first dataflow job was uncompressed but the input file was compressed in the second datatflow job.

当Google数据流读取压缩文件时，性能似乎会出现大幅下降.

It seems there is a huge drop in performance when google dataflow reads compressed files.

读取压缩文件时，ParDo转换和BigQueryIO转换的速度降低了50％以上.

The speed of ParDo transform and BigQueryIO transform drops by more 50% when reading compressed files.

即使我将工人的数量增加到200，也似乎并没有改善，因为读取相同的压缩文件并写入bigquery仍然需要28分钟.

It does not seem to improve even when I increase the number of workers to 200 as it still took 28mins to read the same compressed file and write to bigquery

读取压缩文件时是否可以加快整个过程?

Is there a way to speed the entire process when reading compressed files?

相对于使用Google数据流保存在Google存储中的普通文本文件读取压缩文件时，性能相对较差 [英] Relatively poor performance when reading compressed files vis a vis normal text files kept in google storage using google dataflow

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

相对于使用Google数据流保存在Google存储中的普通文本文件读取压缩文件时，性能相对较差 [英] Relatively poor performance when reading compressed files vis a vis normal text files kept in google storage using google dataflow

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭